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Preface 



The series of ISCIS (International Symposium on Computer and Information 
Sciences) symposia have been held each year since 1986, mostly in Turkey and 
occasionally abroad. It is the main computer science and engineering meeting 
organized by Turkish academics and was founded by Erol Gelenbe. Each year 
ISCIS attracts a significant number of international participants from all over 
the world. The 19tlr ISCIS was organized by Bilkent University, Department of 
Computer Engineering, and was held in Kemer- Antalya, Turkey during 27 29 
October 2004. 

For ISCIS 2004, a total of 335 papers went through the review process and 
a large number of high-quality papers competed for acceptance. This volume of 
the Springer Lecture Notes in Computer Science (LNCS) series contains 100 of 
those papers that broadly fall into the following areas of interest: artificial intel- 
ligence and machine learning, computer graphics and user interfaces, computer 
networks and security, computer vision and image processing, database systems, 
modeling and performance evaluation, natural language processing, parallel and 
distributed computing, real-time control applications, software engineering and 
programming systems, and theory of computing. 

The symposium contained three invited talks. The first talk titled “An Ap- 
proach to Quality of Service” was given by Erol Gelenbe of Imperial College Lon- 
don. The second talk titled “Modeling Assumptions in Mobile Networks” was 
given by Satish Tripathi of State University of New York at Buffalo. The third 
talk titled “Combinatorial Scientific Computing: The Role of Computer Science 
Algorithms in Scientific Simulation” was given by Bruce Hendrickson of San- 
dia National Laboratories. The symposium also contained the following special 
sessions: Advanced Real-Time Applications, All Optical Networks, Component- 
Based Distributed Simulation, Mobile Agents: Mechanisms and Modeling, and 
Performance Evaluation of Complex Systems. 

ISCIS 2004 would not have taken place without the contributions of many 
people. We first thank the 60 program committee (PC) members who did an ex- 
cellent job in publicizing the symposium and in helping to attract a large number 
of high-quality submissions. We thank all the authors who contributed papers. 
We especially acknowledge the time and efforts of Maria Carla Calzarossa, Ivo De 
Lotto, Jean-Michel Fourneau, Jane Hillston, Giuseppe Iazeolla, and 
Salvatore Tucci that went into organizing special sessions. We are also very much 
grateful to the PC members and the external referees who provided thorough 
reviews in a short time frame. 

Among the people who contributed locally, we thank Selim Qiraci for setting 
up the Web server used in the submission process, and Barla Cambazoglu and 
Bora Ugar for, among other things, their meticulous work in the proof reading 
process. 
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We greatly appreciate the financial support from the Scientific and Technical 
Research Council of Turkey (TUBITAK) and the Turkey Section of the Institute 
of Electrical and Electronics Engineers (IEEE) that was used towards the costs 
of printing, inviting speakers, and registering local IEEE members. 

Finally, we thank all who contributed to the planning and realization of the 
symposium. 
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Abstract. Network Quality of Service (QoS) criteria of interest include 
conventional metrics such as throughput, delay, loss, and jitter, as well 
as new QoS criteria based on power utilization, reliability and security. 
In this paper we suggest a theoretical framework for the characteriza- 
tion and comparison of adaptive routing algorithms which use QoS as 
the criterion to select between different paths that connections may take 
from sources to destinations. Our objective is not to analyze QoS, but 
rather to provide routing rules which can improve QoS. We define a QoS 
metric as a non-negative random variable associated with network paths 
which satisfies a sub-additivity condition along each path. Rather than a 
quantity to be minimised (such as packet loss or delay), our QoS metrics 
are quantities that should be maximised (such as the inverse of packet 
loss or delay), similar in spirit to utility functions. We define the QoS 
of a path, under some routing policy, as the expected value of a non- 
decreasing measurable function of the QoS metric. We discuss sensitive 
and insensitive QoS metrics, the latter being dependent on the routing 
policy which is used. We describe routing policies simply as probabilis- 
tic choices among all possible paths from some source to some given 
destination. Sensible routing policies are then introduced: they take de- 
cisions based simply on the QoS of each available path. We prove that 
the routing probability of a sensible policy can always be uniquely ob- 
tained. A hierarchy of m-sensible probabilistic routing policies is then 
introduced and we provide conditions under which an (m + l)-sensible 
policy provides better QoS on the average than an m-sensible policy. 



1 Introduction 

Quality of Service (QoS) has now become a central issue in network design, 
and there is a vast and significant literature on the problem of estimating cer- 
tain specific quality of service parameters (e.g., loss or delay) for given traffic 
characteristics and a given network topology [2,13]. Typically such work has 
considered single buffer models (finite or infinite), or models of cascaded nodes 
with or without interfering traffic. There has also been much work on schemes 
for obtaining better QoS through routing [11,12], on scheduling techniques in 
routers to achieve desired QoS objectives [9], as well as on the analysis of QoS 
resulting from the detailed behavior of protocols such as TCP/IP. 
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The mixed wired and wireless network topologies that are becoming com- 
mon, including fixed and ad-hoc connections, create the need to rationally ex- 
ploit dynamically variable routing as a function of network conditions, since the 
applications that use such networks have QoS requirements such as delay, loss 
or jitter, as well as reliability and low power utilization. 

Motivated by our prior work on adaptive network routing algorithms [3]- [7], 
in this paper we investigate some basic mathematical problems concerning QoS 
driven routing. The aim of this work is not to analyze QoS, but rather to show 
that certain randomized routine policies can improve QoS. 

We define QoS metrics as non-negative random variables associated with 
network paths which satisfy a sub-additivity condition along each path. We then 
describe routing policies simply as probabilistic choices among all possible paths 
from some source to some destination. Incremental routing policies are defined 
as those which can be derived from independent decisions along each sub-path. 
We define the QoS of a path, under some routing policy, as the expected value 
of a measurable function of the QoS metric. We discuss sensitive and insensitive 
QoS metrics, the latter being dependent on the routing policy which is used. 
Sensible routing policies are then introduced; these policies take decisions based 
simply on the QoS of each allowable path. Finally, a hierarchy of m-sensible 
probabilistic routing algorithms is introduced. The O-sensible ruting policy is 
simply a random choice of routes with equal probability, while the 1-sensible 
policy uses the relative QoS for each alternate route to make select a path. An 
m-sensible policy uses the mt power of the QoS for each alternate path, rather 
than just the 1st power. Thus it simply uses the same information in a different 
manner. It is particularly interesting that we can prove that an (m + l)-sensible 
policy provides better resulting average QoS than an m-sensible policy, provided 
that the QoS metric is insensitive. We also prove that under certain sufficient 
conditions, the same result holds for sensitive QoS metrics. 



1.1 Quality of Service (QoS) Metrics 

A QoS metric relates to some specific data unit, the most obvious example being 
a packet. However more broadly, a data unit may be a significant sequence of 
packets which belong to the same connection. A QoS metric q can be illustrated 
by the following examples: 

— qo may be the inverse of the delay D experienced by a packet as it traverses 
some path in the network, or 

— it may be the inverse of the binary variable q r = 1 [the-path-is-connected] , or 

— Qlr — (n/L) may be the number of packets sent n divided by the number 
of packets lost L for a sequence of packets, or 

— q may be the inverse of the average jitter experienced by n successive packets: 

i " r 1 

- R^) - (S t - , 

n — 1 f— ' 



QJ = 
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where Si is the date at which packet l was sent from the source, and Ri is 
the time at which packet l arrives to its destination, etc., or 

— q may be one over the number of hops a packet has to traverse, or 

— q may be one over the power expended by the network nodes to service and 
forward a packet as it travels through a path in the network, or 

— q may be the inverse of the “effective delay” obtained by composing some of 
these values, such as 

Q = [(1 - <lLR)qD + qLFi{T 0 + go)] -1 = [go + qLRT 0 ,] _1 , 
where T a is the (large) timeout delay that triggers a packet’s retransmission. 



1.2 Routing Policies 

Let the nodes in a network be denoted by a fine set of natural numbers 
{0, 1,2,..., N}. 

Definition 1. A path in the network starting at node i and ending at node v,j 
is denoted by Vi = Vd). It is a sequence of nodes such that the first node 

is i, the last node is Vd, and no node in the sequence V appears more than once. 

We associate QoS metrics with paths in the network. 

Let FVi(vd) = {Vf, V?, . . . , V™} be the set of all distinct, but not necessarily 
disjoint, paths from node i to node Vd in the network. 

Definition 2. A routing policy for source-destination pair (*, Vd) is a probability 
distribution -k fv A v <i) on the set FVi(vd), that, selects path Vf £ FV(vd) with 
probability Tr FVi ( Vd ' ) (Vf) for each individual data unit which is sent from node i 
to node Vd- 

For any Vf € FVi(yd), we may write Vf = (i, ... ,1, n, ... , Vd) as a concate- 
nation of a prefix path and a suffix path: Vf = P?.S® where Pf = (i,...,l), 
5® = (n, — ,Vd). Consider now the sets of paths from i to l, FVi(l) and from 
n to Vd, FV n (vd). Whenever needed, II will denote the routing policy for the 
network as a whole, i.e., the set of rules that assign unique paths for each data 
unit moving from any source node to any destination in the network. FV will 
denote the set of all paths from all possible source to destination nodes in the 
network. 



2 QoS Metrics 

Definition 3. A QoS metric for path V is a random variable q n (V) which 
takes values in {0,+oo}, such that for V = V 1 .V 2 (i.e., V is composed of path 
Vi followed by path V 2 ) 



q n {V)<q n {V l ) + q n {V 2 ) 



a.s. 
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Note that the requirement that the QoS metric be sub-additive covers many 
strictly additive metrics of interest such as packet or cell loss rates, delay, path 
length (number of hops), and power dissipation. Other metrics such as path 
reliability and available bandwidth are sub-additive, and are also covered by our 
definition. For a path V composed of two successive sub-paths V = V1.V2, the 
following are obviously sub-additive: 

Qavailable—BW (kQ — laf {^Qavailable—BW (W) , Qavailable—BW (Qi)) a.S., 
Qavailable—BW l) T Qavailable—BW (Qi) O.S., 

Qr{V) = [q r (V 1) and q r {\ 2 )] a.s., 

< q r {Vi) + q r {V 2 ) a.s., 

where q r {.) is treated as a logical binary random value in the third equation, 
and as a numerical (binary) random value in the last equation. 

2.1 QoS Metrics and QoS 

In the sequel q w A d) {Vf) will denote the QoS metric q measured on path Vf , 
when the policy n FVi ^ Vd l is applied to data units travelling from node i to Vd 
using the set of paths FVi(vd), V- G FVi{vd). 

We sometimes write q with a subscript, e.g., q^ V% td (V?) or j Vt< ” d (V?) 
to indicate that it designates some specific metric such as packet delay or packet 
loss rate. 

Definition 4. Let u be a non- decreasing measurable function and q be a QoS 
metric. The QoS for data units sent on the path Vf using policy n FVi ^ Vd ' > , from 
source i to destination Vd along the set of paths FVi{vd) is simply the expected 
value E[u(q ^ ' ( d> (V) J ))], i.e., the expected value of a measurable function of a 

QoS metric. 

The reason for assuming that u is an increasing function (i.e., non-decreasing) 
is that we want to the QoS to reflect the trend of the QoS metric. If the QoS 
metric has a larger value reflecting improvement in the path, we want the QoS 
also to reflect this improvement, or at least not to reflect a degradation. 

2.2 Sensitive QoS Metrics 

The value for some path of a routing sensitive, or simply sensitive QoS metric 
q increases when the probability of directing traffic into that path increases; 
examples include path delay and path loss ratio. An example of an insensitive 
QoS metric is the number of hops along a path; the power dissipated per data unit 
on a path may also be insensitive. Even when the probability of sending traffic 
down a given path is zero, we may assume that the path can be infrequently 
tested to obtain the value of the QoS metric of that path, or the path QoS may 
be known via prior information (e.g., available bandwidth, number of hops, or 
the path’s power dissipation). 




An Approach to Quality of Service 



5 



Definition 5. We will say that the QoS metric q is sensitive on the set FVifvd), 
if for any two routing policies w FVi ^ Vd ' > and Tr lFV A Vd ) and any path Vf £ FVi(vd), 
for all x > 0: 

{n FV ^ Vd \Vf) < n ,FVi ^ d \Vf)} =► 

Plq* FVi{Vd \v?) >x\> P[q”' FViiVd) (V?) > x]. 

Thus a QoS metric is sensitive if, when the load is increased on the path then 
the resulting QoS gets worse (is smaller) . We say that q is insensitive on the set 
FVi(vd) if for any path Vf and any two routing policies such that n FVi {vd \vh^ 

n 'FVi{v d )(y0y 

P[q 7TFV ' (Vd) (V/) > x] = P[q w FVtlVd \vP)\, for all x > 0. 



3 Sensible Routing Policies 

A sensible routing policy is one which: 

— selects paths only using the expected value of the QoS, i.e., the expected 
value of a function u of the QoS metric q for each path, as the criterion for 
selecting the probability that a path is chosen; 

— selects the path for a new data unit independently of the decision taken for 
the previous data unit. 

The practical motivation for considering sensible routing policies is that (1) 
averages of QoS metrics, or of functions of QoS metrics, are typically easy to 
estimate, and (2) decisions which are successively independent for successive 
data units are easier to implement. 

Definition 6. Let u be a non- decreasing measurable function. A sensible routing 
policy (SRP) from node i to destination Vd based on the QoS metric q is a 
probability distribution : w FV A Vd ) on the set FVi(vd) such that: 

nPViMiyj) = 



tt(E[u(q* FVi (V'),u(q” FVi(Vd) (V) 2 ), . . . , u(q^ d) (V'^)'))]), (1) 

for a function f[ : R m — > [0, 1] , for each Vf £ FVi(vd), such that 



E * FVi{vd) w) = i. 

vfeFVi(v d ) 



(2) 



— and for each path Vf , the function ff{yi, ... ,yj, ... lyiFVdv^l) defined in (1) 
is strictly decreasing in its argument yj, with 



lim y fi (yi, ■ ■ ■ , y\FVi(v d )\) = 0. 
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Remark 1. Thus a SRP is a routing policy which decides on routing based only 
on the QoS of each path, such that whenever the value of the QoS for any path 
increases then the probability of selecting that path decreases. 



Example 1. A simple example of a SRP is the following: 

l 



T FVi(v d ) 



{Vi) = 



E[q] 



r^(vn\ 



y' 4 

^ al 1 S E[q* FViiVd) (V/)] 



(3) 



which says that packets are directed to the paths with a probability which is 
inversely proportional to the average delay. 



In a sensible routing policy, the probability that a specific path is selected will 
depend on the QoS of that path, which in general depends on the policy itself, i.e. , 
on the probability that the path is selected, unless the policy is insensitive. Thus 
there is the question of whether we are able to compute the routing probabilities. 
The following theorem provides sufficient conditions for being able to do this. 

Theorem 1. If ■jr FVi ( Vd '> is a sensible routing policy on FVi{vd), then the solu- 
tion to (1) exists and is unique for each path V. f . 



Proof. For any path V ? , consider the function f? (j/i, . . . , yj , . . . , y\FVi(v d )\) °f 
equation (1), which (strictly) decreases when yj increases, and the path QoS 
yj(Tr FVi( - Vd ' > ) = E[u(q nFV ' (Vd) iyi)]. Since 

{-, w FV ^ Vd \vP ) > Tr ,FViM (Vi) } => P[q nFViiVd \vi) >x\> P[q n ' FVi ^ d) (V?) > x) 

for all x > 0, the path QoS is an increasing function (not strictly) yj(n) of its 
argument, the probability 7r, because of (1), and because u is an increasing func- 
tion. Thus the solution of equation (1) for any V t j is obtained at the intersection 
of a non-negative, strictly decreasing function f:j of yj which tends to zero, and 
an increasing non-negative function yj of ff . □ 

4 m— Sensible Routing Policies (m— SRP) 

In this section we extend the concept of a sensible policy to more sophisticated 
usage of QoS to make routing decisions. We construct a hierarchy of m-sensible 
policies, where the 1-sensible policy is just the sensitive policy defined earlier, 
and the 0-sensible policy is a random uninformed choice between paths with 
equal probability. What is particularly interesting is that, just by increasing the 
value of m we are guaranteed to achieve better overall QoS, when the QoS metric 
is insensitive. The same result can be obtained in the sensitive case as well under 
certain sufficient conditions. 
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Definition 7. For a natural number m, an m-sensible routing policy (m-SRP) 
from node i to destination Vd based on the QoS metric q is a probability distri- 
bution n FVi ^ Vd ' > on the set FVi(vd) such that 



Tr F Vi(v d )(yj) 



1 



E 



E[(u(qT’ {FVi '' Vd) (V?))"*] 

1 ~ 
M S ( V‘)) m ] 



(4) 



We will use the notation 7r m SRP l FV i( v d)] denote the fact that the policy 
7 r on the set of paths FVi(vd ) is m-sensible, and the corresponding QoS value 
will be denoted by (f™ SRPlFVpVd)] f or path Vf . Note that a 0-SRP is just 
a random choice among paths, with equal probability. 



4.1 The m — Sensible Routing Theorem when the QoS Metric Is 
Insensitive 



In this section we assume that q is insensitive on the set FVi(vd), and consider 
m-SRP routing policies as defined in (4). 

To simplify the notation, let us associate the index j with the path Vf and 
write 

WM) = E[u^ m - SRPlFVi(Vd \vf))}. (5) 

When q is insensitive, we will simply write Wj. Using (4) and (5) we have 



Q? 



m-SRP[V Fi] 



E n Wj 

3 = 1 Wf 

E n 1 

3 = 1 Wp 



( 6 ) 



We first prove the following simple result. 



Lemma 1. For any Wj > 0, Wk > 0, 



\{Wj + W k ) > 




or 



Wj_ 

w k 



w k > p 

Wi - Z ' 



Proof. Since (Wj — Wk) 2 > 0, we have ( W 2 + W 2 ) > 2 WjWk, and therefore 
(Wj + Wk) 2 > 4WjW k , or 



\(Wj + W k ) > 



2 

Wj +w k 



and therefore 



(Wj + W k )(4 j + 4) > 4 , 



which can be written as 



2 + flU + Wz. j > 4 
Z + 1 Wj + w k > — U 
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thus completing the proof. □ 

We will call the following result the m—SRP theorem (m-sensible routing 
theorem) for insensitive metrics. 



Theorem 2. If q is insensitive on the set FVi(vd), the policy (m + 1 )-SRP is 
better than m-SRP for m > 1, i.e., 



^^m-SKPIVFJ > Q* (m + 1 )_SRP[VFiI 



Proof. From Lemma 1, we have that for any Wj > 0, Wk > 0, 



m. 

w k 



Wk. 

Wi 



> 2 , 



and multiplying both sides by 1 /{Wf^W^f 1 ) we obtain 

i _i_ i > 2 

WJ” _1 W“ + 1 ^ W“ + 1 W™ _1 - w™wp 



Summing for j,k = 1 , ... ,n and adding identical terms on both sides, we have 



V” 1 i y ra r i i i 1 > 

2-ij-i ( wj n ) 2 ' t w m-i w m+i ' wp+'w™- 1 J — 

E ™ 1 . 2 

(WP ) 2 ~r Z^j t k=l;j^k W™ W™ ’ 

or 



<? = -L W„ 

This can be written as 



(£7=1 ufM(£?=! > (£” =1 T^) 2 . 






7 — 1 ^m-1 ^ Xij-=1 wy 



» — \ ’ l ■ 

2-.j = l WW 1 w n»+l 



or in the final form 



L,=i 



> 



v^n W 
Wj = 1 WrnTT 



j= 1 WT 71 ^-^j= 






j=l w m + 1 



which completes the proof. □ 

It is obvious that for an insensitive QoS metric, selecting m to be very large 
is good, since this will lead to choosing the path with the best QoS if such a 
path exists. We summarize this point in the following remark. However, if the 
QoS metric is sensitive then the matter is quite different, as will be discussed in 
the next section. 



Remark 2. Suppose that q is insensitive on the set FVi(vd), and that path Vf 
is best in the following sense: 

Wt < W 2 , ..., W n . 

Then linp,,^ Q* m - SRPlVFi] = Wl _ 
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Proof. Using 



Q 



m-SHP[VFj 



E n Wj 

j=l W™ 

E n 1 






yields the result when we take m —* oo. □ 



(7) 

(8) 



4.2 The Sensible Routing Theorem for Sensitive QoS Metrics 

When the QoS metric is sensitive, the QoS varies with the load on the paths. 
This is of course the most common situation in practice, e.g., for QoS metrics 
such as delay, packet or cell loss, etc. Thus we cannot generalize Theorem 2 to 
the case where the QoS metric is sensitive. However we can provide necessary 
and sufficient conditions which will yield a similar result. 



Theorem 3. If q is sensitive on the set FVi(vd), the policy (m + 1)-SRP is 
better than m-SRP for m > 0: 



Q 1 ' 



n-SRPIVF,] 



> Q 



(m + l)-SKP[VFj] 



provided that the following condition holds: 

E n 1 y^ra I 1 V j 

j = 1 (Wj(m+ l)) m_1 1 (Wj (m)) m_1 -I 2-^j=l (Wj(m+ l)) m + 1 

< [V n I sr^n 1 l 1 

— ^j=l (Wj (m+l)) m 2 ^j = 1 (^( m )) m JLj= 1 (Wj (m+l)) m ' 



5 Conclusions 

In this paper we suggest a theory of routing based on QoS. We have distinguished 
between QoS metrics, and QoS. Variable and adaptive routing have again become 
of interest in networking because of the increasing importance of mobile ad- 
hoc networks. In this paper we have developed a framework for the study of 
adaptive routing algorithms which use the expected QoS to select paths to their 
destination. Our objective is not to analyze QoS, but rather to design randomized 
routing policies which can improve QoS. We define QoS metrics as non-negative 
random variables associated with network paths that satisfy a sub-additivity 
condition along each path. We define the QoS of a path as the expected value 
of a non-decreasing measurable function of the QoS metric. We discuss sensitive 
and insensitive QoS metrics, the latter being dependent on the routing policy 
which is used. An example of an insensitive QoS metric is the number of hops 
on a path, since it will not change with the fact that this particular path is 
selected by the route selection. We describe routing policies as probabilistic 
choices among all possible paths from some source to some given destination. 
Sensible routing policies are then introduced: they take decisions based simply 
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on the QoS of each possible path. Sensible policies, which make decisions based 
on the QoS of the paths, are introduced. We prove that the routing probability of 
a sensible policy can always be uniquely determined. A hierarchy of m-sensible 
probabilistic routing policies is then introduced. A 0-sensible policy is simply a 
random choice of routes with equal probability, while a 1-sensible policy selects 
a path with a probability which is inversely proportional to the (expected) QoS 
of the path. We prove that an (m + l)-sensible policy provides better QoS on 
the average than an m-sensible policy, if the QoS metric is insensitive. We also 
show that under certain conditions, the same result also holds for sensitive QoS 
metrics. 
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Abstract. ‘Curse of Dimensionality’ problem in shape-space representation 
which is used in many network-based Artificial Immune Systems (AISs) affects 
classification performance at a high degree. In this paper, to increase 
classification accuracy, it is aimed to minimize the effect of this problem by 
developing an Attribute Weighted Artificial Immune System (AWAIS). To 
evaluate the perfo nuance of proposed system, aiNet, an algorithm that have a 
considerably important place among network-based AIS algorithms, was used 
for comparison with our developed algorithm. Two artificial data sets used in 
aiNet, Two-spirals data set and Chainlink data set were applied in the perform- 
ance analyses, which led the results of classification performance by means of 
represented network units to be higher than aiNet. Furthermore, to evaluate per- 
formance of the algorithm in a real world application, wine data set that taken 
from UCI Machine Learning Repository is used. For the artificial data sets, pro- 
posed system reached 100% classification accuracy with only a few numbers of 
network units and for the real world data set, wine data set, the algorithm ob- 
tained 98.23% classification accuracy which is very satisfying result if it is con- 
sidered that the maximum classification accuracy obtained with other systems 
is 98.9%. 



1 Introduction 

A new artificial intelligence area named as Artificial Immune Systems (AISs) is go- 
ing forward gradually. There are many AIS algorithms in which recognition and 
learning mechanisms of immune system were modeled. As a representation method 
of immune system cells, shape-space approach is used in many of the AIS classifica- 
tion algorithms. Shape-space model, which was proposed by Perelson and Oster in 
1979 [1], is used as a representation mechanism modeling the interactions between 
two cells in the immune system. 

‘Curse of Dimensionality’ is among the main problems of classification systems in 
which distance criterion is used as a metric [2,3]. One attribute value in shape space 
can cause two data in the same class to be distant from each other and therefore to be 
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recognized and classified by different system units. In this paper, it was aimed to 
reach higher classification accuracy by assigning weights to important attributes in 
classification. This was done with some modifications to affinity measures of AISs 
and then a new system named AWAIS (Attribute Weighted Artificial Immune Sys- 
tem) has come into existence. 

AiNet, one of the algorithms in which shape-space representation is used, is both a 
network-based and population-based AIS algorithm to be used in data mining and 
classification applications [4]. For effects of novelties obtained by AWAIS on classi- 
fication performance to be analyzed, the artificial data sets used by aiNet were ap- 
plied to AWAIS and resulted system performance was evaluated comparatively with 
aiNet. According to the results of performance analyses, it was observed that AWAIS 
showed a higher performance with respect to the minimum number of system units at 
which classification can be done at a high accuracy and less processing time of the 
algorithm. Besides of these comparison data sets, a real world application of the algo- 
rithm is performed with wine data set which consists of the data taken from the re- 
sults of a chemical analysis of wines grown in the same region in Italy but derived 
from three different cultivars [5]. Performance results for this application showed 
considerably satisfying classification accuracy, 98.23%, with respect to the other 
classifiers. 

This paper is organized as follows. In Section 2, natural and artificial immune sys- 
tems are introduced and shape-space representation is explained. Section 3 is allo- 
cated for the comparison algorithm, aiNet, and its deficiencies. The AWAIS algo- 
rithm is proposed in Section 4. Section 5 contains the experimental results and analy- 
ses. In Section 6, experimental results are discussed and future works are emphasized. 



2 Natural and Artificial Immune System 

The natural immune system is a distributed novel-pattern detection system with sev- 
eral functional components positioned in strategic locations throughout the body [6]. 
Immune system regulates defense mechanism of the body by means of innate and 
adaptive immune responses. Between these, adaptive immune response is much more 
important for us because it contains metaphors like recognition, memory acquisition, 
diversity, and self-regulation. The main architects of adaptive immune response are 
Lymphocytes, which divide into two classes as T and B Lymphocytes (cells), each 
having its own function. Especially B cells have a great importance because of their 
secreted antibodies {Abs) that takes very critical roles in adaptive immune response. 
For detailed information about immune system refer to [7]. 

Artificial Immune Systems emerged in the 1990s as a new computational research 
area. Artificial Immune Systems link several emerging computational fields inspired 
by biological behavior such as Artificial Neural Networks and Artificial Life [8]. 

In the studies conducted in the field of AIS, B cell modeling is the most encoun- 
tered representation type. Different representation methods have been proposed in 
that modeling. Among these, shape-space representation is the most commonly used 
one [1]. 
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The shape-space model (5) aims at quantitatively describing the interactions 
among antigens (Ags), the foreign elements that enter the body like microbe, etc., and 
antibodies (Ag-Ab). The set of features that characterize a molecule is called its 
generalized shape. The Ag-Ab representation (binary or real-valued) determines a 
distance measure to be used to calculate the degree of interaction between these 
molecules. Mathematically, the generalized shape of a molecule (m), either an anti- 
body or an antigen, can be represented by a set of coordinates in = <n\i, m 2 ,...m L >, 
which can be regarded as a point in an Z-dimensional real-valued shape-space 

( m G S 1 ). In this work, we used real strings to represent the molecules. Antigens 
and antibodies were considered of same length L. The length and cell representation 
depends upon the problem [6]. 



3 AiNet and Its Deficiencies 

In the aiNet algorithm proposed by De Castro and Von Zuben [4,9], antibodies repre- 
sent the internal image of the input space. Antibodies, the system units, are named as 
‘nodes’ which have connections with some other nodes. In connected node pairs 
called ‘edges connection weights define the strength of the interaction between the 
nodes in that edge. 

aiNet was developed as a data mining approach and the responsibility of the 
algorithm was to represent input data with less memory units carrying the same class 
distribution with original data before the classification. So, for the algorithm to be 
used as a classification system alone, some class-analysis methods must be used after 
training to determine the system units of each class. In the test phase, the class of 
memory unit with smallest distance to presented data is given as the class of this data. 
So, determination of system units is sensitive to ‘curse of dimensionality’ problem 
due to the distance criteria. 

The main evident deficiencies of aiNet as emphasized by the authors in [4] are the 
high number of parameters determined by the user and the processing overhead of 
each iteration. Besides, shape-space representation makes some class-analysis 
methods like MST, Dendogram necessary if the algorithm is to be used for clas- 
sification. 



4 AWAIS (Attribute Weighted Artificial Immune System) 

As mentioned before, most of network-based AIS algorithms use shape-space repre- 
sentation and ‘curse of dimensionality’ problem inevitably appeared in turn affects 
the system performance [4,10,11]. The AWAIS algorithm proposed for minimizing 
the effect of this problem is a supervised Artificial Immune System based on attribute 
weighted distance criteria. The supervision in the algorithm shows itself while deter- 
mining the weights of attributes and during the process of developing memory cells in 
the training by taking the class of the input data into account. AWAIS is a two-level 
classification system in which attribute weights of each class are formed in one level 
and a training procedure with these weights takes place at the other. 
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4.1 Attribute Weighting 

In most real valued shape-space representations, the distance between two points is 
calculated by the Euclidean distance criteria (Eq. (1)): 




Where ab and ag are the two points in the shape-space represented by a vector re- 
spectively and L is the length of these vectors. According to this formula, all of the 
attributes have same effect in determining distance. However, there are such data sets 
that some attributes of them have no effect on the class of data while some other at- 
tributes are more important in determining class. So, if it is assigned higher weights to 
the attributes that are more important in determining one class and if these weights 
are used in calculation of distance, it can be prevented to make a misclassification of 
the two distant data according to the Euclidean norm in the same class. Starting from 
this point, the proposed attribute weighting depends on the following base: if one 
attribute doesn’t changing very much among the data of one class, this attribute is one 
of the characteristic attributes of related class and it must have a higher weight than 
others. 

The applied attribute weighting procedure in the AWAIS is as follows: 

(1) Normalization of each attribute in data set between 0-1. 

(2) Determine the antigens of each class— * Ag_classj n: number of class) 

(3) For each class do: 

For Ag_class ( L X N C ) to be a matrix that involves the antigens of that class; 

( L : attribute num., Nc: ag num. of that class); 

(3.1) For i th attribute do:(i:l,...,L) 

Evaluate standard deviation of i ,h attribute with Eq. (2): 

std _ dev f = 

Here Ag^i is the i‘ h attribute of k ,h Ag in j ,h class; mean(Agj) is the mean of i' h 

attribute of all Ags in j ,h class. 

Calculate the weights as follows: 

Wj =I /std_deVj , (i=l,...L;j=l,...n) ( 3 ) 

(3.2) normalize the weights of j th class. 

The calculated w nxL matrix is a normalized weight matrix involving the weights of 
each attribute for each class and this matrix is used in distance calculations of the 
training algorithm of AWAIS. 

Here, in the attribute weighting procedure, a means of normalization of attributes 
for each class by standard deviation is performed. By doing so, each class has its own 
set of attribute weights. 
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4.2 AWAIS Training Algorithm 

The originality of AWAIS lies in formation of memory Antibodies through using the 
attribute weights and in knowing the classes of formed memory Antibodies. 

The training procedure of the algorithm conducts the following steps: 

(1) For each Ag, do :( /: 1,...N ) 

(1.1) Determine the class of Ag,. Call memory Abs of that class and calculate the 
distance between Ag,- and these memory Abs with Eq. (4): 

( 4 ) 

Here Ab ik and Ag ik are the k ,h attribute of Abj and Ag,- respectively; w jk is the 
weight of k ,h attribute that belongs to the class of Abj. 

(1.2) If the minimum distance among the calculated distances above is less than a 
threshold value named as suppression value ( supp ) then return to step 1 . 

(1.3) Form a memory Ab for Ag,: 

At each iteration do: 

(1.3.1) Make a random Ab population with Ab=[Ab_mem ; Ab rand] and 
calculate the distances of these Abs to Ag,-. 

(1.3.2) Select m nearest Abs to Ag,; cion and mutate these Abs ( Abjnutate ). 

(1.3.3) Keep the m nearest Abs in the Abjnutate population to Ag, as 
Abjnem temporary memory population. 

(1.3.4) Define the nearest Ab to Ag,- as Ab_cand, candidate memory Ab for 
Ag, and stop iterative process if the distance of Ab_cand to Ag, is less 
that a threshold value named as stopping criterion (sc). 

(1.3.5) Concatenate Ab_cand as a new memory Ab to memory matrix of the 
class of Ag,. 

(1.4) Stop training. 

The mutation mechanism in the algorithm which is used in many AIS algorithms and 
named as hypermutation is performed proportional to distance between two cells 
(Eq.(5)): 

Abjk=Abj k^Djj* (Abj P) ( 5 ) 

Here Abj' is the new value and Abj is the old value of k"' attribute of j th Ab. D Jt 
stands for the distance between Ag, and Abj. 

The used affinity measure is no more a pure Euclidean Distance and the attribute 
weights are used in distance criteria. As another important point, the classes of 
memory Abs in the AWAIS after training are known with the aid of a labeling vector 
that contains the information about which memory Abs belong to which class. This 
makes us to get rid of the problem of using extra class-analysis methods after training 
unlike aiNet. 
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5 AWAIS Performance Analysis 

The classification performance of AWAIS was analyzed in three data sets. Among 
these, two data sets were selected to compare the proposed algorithm with an other 
AIS, aiNet, that is in the same category with AWAIS. These selected artificial data 
sets are Two-spirals and Chainlink data sets which were used in the performance 
analysis of aiNet [4]. In the analyses, classification accuracy and memory cell number 
are given with respect to the suppression parameter ( supp or s) in tabulated and 
graphical forms. Also, compression rates are determined according to the memory 
cell number (m) that represents the input data set and compared with the compression 
rates reached by the aiNet algorithm for same data sets. Other than these data sets, 
one more application with wine data set was performed to see the real world perform- 
ance of the algorithm. 



5.1 Two-Spirals Data Set 



This data set consists of 190 data [12], 130 of these were used for training and the 
remaining 60 were used as test set. The data in the set contains two classes. The clas- 
sification performance of AWAIS for this set is given in Table 1. The classification 
accuracy and memory cell number with respect to the supp parameter in test set is 
given in Fig. 1(a) and Fig. 1(b) respectively. 




Suppression value 




Suppression value 



(a) 

(b) 

Fig. 1. Performance analysis of AWAIS for test data set of Two-spirals data set: 
(a) Classification accuracy versus suppression value, (b) Number of memory cell versus sup- 
pression value 



In the Fig. 3(a), Two-spirals data set and memory cells that give the best classification 
performance are presented. 
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Table 1. Performance analysis of AWAIS for Two-Spirals data set 



Suppression 

Value(s) 


Number of 
Memory Celltm) 


Compression 

Rate(%) 


Accuracy (%) 


1 


7 


96.31 


42 


0.8 


10 


94.73 


45 


0.6 


14 


92.63 


77 


0.4 


24 


87.36 


98 


0.36 


27 


85.78 


100 


0.2 


43 


77.36 


100 



5.2 Chainlink Data Set 

1000 data are used in this data set [12]. 500 of these data were allocated for training 
with using other 500 data for testing. Again, two classes exist in the set. The perform- 
ance results of AWAIS for Chainlink data set are given in Table 2. The Chainlink 
data set and memory cells giving the best classification accuracy are shown in Fig. 2. 
The classification accuracy and memory cell number versus suppression value are 
given in graphical form in Fig. 3(b) for the test set. 



Table 2. Performance of AWAIS analysis for Chainlink data set 



Suppression 
Value (s) 


Number of 
Memory Cell (in) 


Compression 
Rate (%) 


Accuracy (%) 


1 


2 


99.8 


86.2 


0.8 


4 


99.6 


100 


0.6 


6 


99.4 


100 


0.4 


9 


99.1 


100 


0.2 


30 


97.0 


100 





suppression value Suppression value 



(a) (b) 

Fig. 2. Performance analysis of AWAIS for test data set of Chainlink data set: 
(a) Classification accuracy versus suppression value, (b) Number of memory cell versus sup- 
pression value 
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Fig. 3. (a)Two-spirals data set and memory cells (b) Chai n link data set and memory cells 

According to the experimental results, it is hopeful for us to see the system reached 
100% classification accuracy with remarkable small number of network units. When 
it comes to the processing time of the system, it was able to form memory units in 4 
sec for Two-spirals and 11 sec for Chainlink data set in Matlab 6.5 programming 
language conducted on Intel Pentium II processor computer. This is also an advan- 
tage of the system with regard to the aiNet’ s processing time 14 see and 23 see re- 
spectively for the same problems conducted on the same computer. 

It is seen in Table 3 that the compression rates of aiNet and AWAIS for Two- 
spirals and Chainlink data set. As can be seen from the Table, AWAIS can classify 
the data sets in the same accuracy with aiNet by less system units. 

Table 3. The Comparison for compression rates of aiNet and AWAIS algorithms for Chainlink 
and Two-spirals data sets 



Data Set 


aiNet (%) 


AWAIS(%) 


Chainlink 


94.5 ((1 000-5 5)/l 000) 


99.6 ((1000-4)/! 000) 


Two-spirals 


74.21 ((190-49)/190) 


85.78 ((190-27)/190) 



5.3 Wine Data Set 

In the experimental part of this study, the artificial data sets above were used for the 
performance analyses of the proposed algorithm as well as for comparison with aiNet. 
To analyze the behavior of AWAIS in a real world problem, wine data set was also 
used in the experiments. This data set consists of the results of a chemical analysis of 
wines grown in the same region in Italy but derived from three different cultivars [5]. 
The analysis determined the quantities of 13 constituents found in each of the three 
types of wines. This data set was applied to AWAIS with 10-fold cross validation 
(10 CV) method and the classification accuracies of the algorithm were analyzed for 
different values of supp parameter. The maximum classification accuracy reached by 
AWAIS was 98.23% for 0.1 value of supp parameter. 
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Various methods were also used for this classification problem and they obtained 
the following classification accuracies shown in Table 4 with the result that reached 
by AWAIS [13]. All of these methods used 10 CV scheme as AWAIS. As can be 
seen from the table, the classification accuracy of AWAIS is comparable with other 
classification methods for this problem and this result may be improved by some 
modifications on the algorithm resulting in a higher accuracy than all of the methods. 

Table 4. Classification accuracies obtained by different methods for wine data set with 
AWAIS performance result 



Method: 


Classification Accuracy (%): 


kNN, Manhattan, auto k= 1-10 


98.9±2.3 


10 CV SSV, opt prune 


98.3±2.7 


AWAIS 


98.2±3.0 


kNN, Euclidean, k=l 


97.8±2.8 


IncNet, 1 0CV, def, bicentral 


97.2±2.9 


FSM a=. 99, def 


96.1±3.7 


kNN, Euclidean, k=l 


95.5±4.4 


10 CV SSV, opt node, BFS 


92.8±3.7 


10 CV SSV, opt node, BS 


91.6±6.5 


10 CV SSV, opt prune, BFS 


90.4±6. 1 



6 Conclusions 

Shape-space representation, especially used in many network-based AIS algorithms is 
a means of representing immune system units as system units and this representation 
scheme also defines the interactions of the system units with the environment by 
means of distance criteria. The ‘curse of dimensionality’ problem appeared in the 
distance-based classification systems affects the classification performance in nega- 
tive manner especially for nonlinear data sets. In this paper, by proposing an Attribute 
Weighted Artificial Immune System (AWAIS), this problem of shape-space represen- 
tation was tried to be minimized. The key concepts and mechanisms in the AWAIS 
are; assigning weights to attributes proportional to their importance in determining 
classes, usage of these weights in calculations of distance measures between the sys- 
tem units and input data, clonal selection and hypermutation mechanism similar to 
those that were used in other AIS algorithms. Two artificial data sets, Chainlink and 
Two-spiral data sets and one real world data set, wine data set were used in the per- 
formance analyses of AWAIS. The obtained results for artificial data sets were com- 
pared with other important AIS, aiNet. With regard to the performance results both in 
terms of classification accuracy and the number of resulted system units, the proposed 
system performs classification at 100% accuracy with less system units than aiNet. To 
analyze AWAIS classification accuracy for a real world problem, another data set, 
wine data set was used for classification and 98.23% classification accuracy was 
reached for optimum parameter values. With regard to the classification systems used 
for the same data, this result is very satisfactory for now because it gives us a support 
that we are on correct way to develop supervised artificial immune networks based on 
shape-space representation with high accuracies. 

Due to the fact that the study aims at elimination of problems caused by shape- 
space representation, the training procedure have similarities with aiNet in respect to 
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the main metaphors of the immune system inspired from. One of the future works 
related to this system is improving the attribute weighting procedure such that the 
algorithm performs very well in a great variety of data sets. Besides, the attribute 
weighting scheme can be applied other AIS algorithms that use shape-space represen- 
tation and performance of these algorithms can be improved by this way. For one 
more thing, attribute weighting itself can be done by AISs, too, by adding some adap- 
tation. Nevertheless, a new branch of classification field is improving day by day 
toward high performance classification systems with Artificial Immune Systems. 
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Abstract. Prediction by partial match (PPM) is an effective tool to address the 
author recognition problem. In this study, we have successfully applied the 
trained PPM technique for author recognition on Turkish texts. Furthermore, we 
have investigated the effects of recency, as well as size of the training text on 
the performance of the PPM approach. Results show that, more recent and 
larger training texts help decrease the compression rate, which, in turn, leads to 
increased success in author recognition. Comparing the effects of the recency 
and the size of the training text, we see that the size factor plays a more 
dominant role on the performance. 



1 Introduction 

Today, with widespread availability of documents on the Internet, author recognition 
attains higher importance. Author recognition is the process of determining the author 
of unknown or disputed texts. It not only dissolves the ambiguity to reveal the truth, 
but also resolves the authorship attribution and plagiarism claims [1], 

Discriminating among authors can be accomplished through various techniques. 
Most of the author recognition methods are lexically based in that, they exploit infor- 
mation about variety of words, consistency among sentences, use of punctuations, 
spelling errors, frequency and average of words, usage of filler words within the text, 
etc. These techniques extract the stylometry (wordprints) of disputed texts [2]. One of 
the fundamental notions in stylometry is the measurement of what is termed the rich- 
ness or diversity of an author’s vocabulary. Mathematical models exist for the fre- 
quency distributions of the number of vocabulary items as well as once-occurring 
words ( hapax legomena ) and twice occurring-words ( hapax dislegomena) as 
stylometric tools [3]. Another lexically based technique proposed by Burrows is to 
use a set of common function (context-free) word frequencies in the disputed text [4]. 
This method requires the selection of the most appropriate set of words that best dis- 
tinguish a given set of words. 

All the lexically based style markers are so highly author and language dependent 
that, they frustrate the result of such measures to be applied to other authors and lan- 
guages. As a result, syntax-based approach, which exploits the frequencies of the re- 
write rules as they appear in a syntactically annotated corpus, has been proposed. 
Whether used with high-frequent or low-frequent rewrite mles, the syntax based ap- 
proach yields accuracy results that are comparable to lexically based methods. Yet, 
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the syntactic annotation required to prepare the text for this method itself is highly 
complicated, which makes it less common [5]. 

Another approach used by Khmelev considers a sequence of letters of a text as a 
Markov chain and calculates the matrices of transition frequencies of letter pairs to 
get the transition probabilities from one letter to the other for each disputed author [6], 
Exploiting the principle of maximal likelihood, the correct author is determined to be 
the one with the maximal corresponding probability (or the minimal relative entropy 
which is inversely proportional to the probability) [7]. This technique has been suc- 
cessfully applied on determining the authorship for disputed Russian texts [8], 

A slightly different approach in the area is entropy based: It uses compression tools 
for author recognition. Teahan lead some compression based experiments on text 
categorization, which is the basis for authorship attribution [9]. Although until that 
time it was believed that compression based techniques yield inferior results com- 
pared to the traditional machine learning approaches, Teahan and Harper obtained 
competitive results with compression models on text categorization [10]. 

With some compression algorithms using another text (called the training text) to 
extract symbol frequencies before compression, it is possible to determine the author 
of disputed texts. The idea is that, when the training text belongs to the same author 
as that of the disputed text, the symbol statistics gathered by the model on the training 
text will highly resemble that of the disputed text, which in turn will help the disputed 
text to be compressed better. No other text that was written by some other writer is 
expected to compress the disputed text better. Prediction by partial match (PPM)[1 1] 
is such an algorithm. In PPM, first a statistical model is constructed from the symbol 
counts in the training text and then the target text is compressed using these statistics. 

There exists some authorship detection studies carried on Turkish texts: One is that 
of Tur’s study, which uses word-based, stem-based and noun-based language models 
for determining the correct author [12]. Among the three methods, word-based model 
yields the best performance. 

In the following section, we introduce our approach that we used for author recog- 
nition problem. In Section 3, we present the results of our implementations and in the 
last section, conclusion together with suggestions for future work are given. 



2 Our Approach 

In this study, we used a compression based approach employing PPMD+, a derivative 
of the PPM algorithm which uses training text to compress files, for author recogni- 
tion on sample Turkish texts. The idea is that determining the correct author of the 
text T, is just a matter of calculating 0(T) in Eq. 1, where H(T|S) is some approxima- 
tion to the relative entropy of text T with respect to text S [13]: 

^(T) = min i H(T|S i ) (1) 

In our implementation, T is the disputed text whose author is to be determined and 
S; is the training text belonging to the i th candidate author. From theory, we know that 
compression rate is a measure of entropy, i.e. the higher the entropy of a source text, 
the harder it is to compress it. Hence, we can use compression rate (measured in bits 
per character - bpc) and entropy interchangeably to evaluate our experiment results. 
If there are m training texts as Si, S 2 , ..., S m , using each S as the training text, we 
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compress the disputed text T with PPMD+ and then rank the resulting bpc values. At 
the end of m runs we conclude that the disputed text T belongs to the author whose 
training text (say S ; ) yields the lowest bpc rate, i.e. highest compression. 

Besides author recognition attempts, we further investigated the effect of two fac- 
tors, namely recency and size of the training text to the recognition performance. In- 
tuitively, more recent, as well as larger training text should be more successful in de- 
termining the right author. To test the correctness of this argument and to determine to 
which extend each factor, i.e. recency and size, affects the rate of success of the au- 
thor recognition attempts, we applied several tests. 



2.1 Prediction by Partial Match (PPM) Algorithm 



A statistics based compression algorithm needs a model to determine how to represent 
the input stream. It then uses this model to encode the input sequence. PPM is such a 
modeling technique, which provides symbol statistics to the encoder [5], It blends to- 
gether several fixed-order context models to predict the next symbol in the input se- 
quence. These distributions are effectively combined into a single one, and Arithmetic 
Coding [14] is used to encode the unit that actually occurs, relative to that distribu- 
tion. The length of the context determines the order of the PPM model. In case the 
current order cannot determine the probability for the upcoming symbol, a special se- 
quence called the escape sequence is issued and the control is transferred to a lower 
order model. 

PPM model takes different names according to the probabilities assigned to escape 
events. What we used for our author recognition studies is PPMD+, i.e. PPM model 
with method D. In this algorithm, escape ( e ) and symbol probabilities (p(t/>)) are cal- 
culated as below (Eq. 2), where c (<f>) is the number of times the context was followed 
by the symbol <j), n is the number of tokens that have followed, i.e. the sum of the 
counts for all symbols, ^ ; and t is the number of types. 



e = 



t 

2 n 



and 



P(0) = 



2cW~l 

2 n 



(2) 




Fig. 1. Compression via PPM model 



The PPM model is depicted in Figure 1 above. As seen from Figure 1, the input 
text is fed into the encoder which interacts with the adaptive PPM model. The model 
estimates a probability distribution for the upcoming symbol based on the symbols 
previously seen in the text. The symbol is then encoded using this probability distri- 
bution and the model is revised for better estimating the upcoming symbols. 
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PPM is flexible enough to let those symbols to be either characters, or words or 
parts of speech (PoS) tags. In our implementation, we choose symbols to be charac- 
ters. This makes the PPM model independent of the source language. 



2.2 Corpus 

As for the corpus, we compiled daily articles of four journalists (namely Mehmet Ali 
Birand, Cetin Altan, Emin Colasan and Melih Asik) on Turkish newspapers (Hurriyet, 
Milliyet and Sabah) via Internet. For each author, we created six files: One file con- 
sisting of the author’s articles before 2000, and five more belonging to years between 
2000 and 2004, inclusively. For 2004, we only have the authors’ articles written dur- 
ing the first four months of the year. 

The authors we selected are among the daily columnists, each of whom has his 
own writing style. As a side note, Melih Asik frequently allows his column to be par- 
tially shared by other authors or even readers. Very frequently, he quotes reader let- 
ters which may expectedly lead to a much less stable writing style. 



3 Results 

In order to determine the correct author of disputed texts, we employed PPMD+ on 
our Turkish corpus. Besides, we investigated the effect of two factors to the 
recognition performance: One being the chronological distance, i.e. recency of the 
training text to the disputed text and the other being the size of the training text. In the 
following subsections, these experiments are explained and their results are given. 



3.1 Author Recognition 

The author recognition problem may arise in two different flavors. We either know 
the authors of older texts and using these texts, we are required to determine the cor- 
rect author of a newer disputed text. The other case is that, we already have current 
texts belonging to known authors at hand, and we are required to determine the cor- 
rect author of an older disputed text. These cases are reflected to our implementation 
as disputed text being the compressed text and known text being the training text. 

Due to the nature of the PPM algorithm, we expect the training text, which is 
chronologically closer to the target text, to compress the target text better. To see 
whether this is the case in actual implementation or not, we run PPMD+ algorithm on 
yearly organized Turkish newspaper articles of the four different columnists. To cal- 
culate the classification accuracy for author recognition tests we use Eq. 3 : 

# of correctly identified texts 

accuracy = # of total texts (3) 

Assume that we have older texts belonging to years before 2000 of the four journal- 
ists (Birand, Altan, Colasan and Asik) and we are required to determine correct author 
of these journalists’ newer (belonging to years between 2001 and 2004) articles. This 
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is an instantiation of the first case for author recognition problem, as mentioned 
above, where we use the older texts of each author as the training text to compress 
these authors’ newer texts with PPMD+. For that, we use the full size of each article. 



Table 1 . Author recognition on sample Turkish texts with PPMD+ using foil file sizes 





Training text 


Birand 19XX 2000 


Altan 19XX 2000 


Coiasan 19XX 2000 


Asik 19XX 2000 


Identification 


Compresses' 


^size (bytes) 


(1.7MB) 


(3.3MB) 


(2.3MB) 


(1.5MB) 


margin 


| text (approx. size)^-^^| 


Compression rate (bpc) 


Compression rate (bpc) 


Compression rate (bpc) 


Compression rate (bpc) 


(bpc) 




mwrsmsi 


1.98 


2.27 


2.22 


2.20 


0.22 


Birand 


■»" 


1.87 


1.99 


1.94 


1.91 


0.04 




THE 


553 


1.93 


n55 


5154 






1.91 


2.07 


2.01 


1.98 


0.07 




HMMiHh:il 


2.27 


i.66 


2.25 


2.17 


0.21 


Altan 


TOKWimn 


l.ii 


354 ' 


5.16 " 


" 3353 


5357 


msmisjni 


2.09 


i.6s 


2.08 


2.01 


0.06 




H'i!»i*lb:M 


5.54 


" 2.03 


533 


1.11 


— 535 — 




H!ilMi*kb:il 


2.15 


2.17 


1.94 


2.03 


0.09 


Coiasan 


mwna m 


5.64 


3353 


1353 


1.67 


5353 


mswmnsi 


2.06 


2.10 


1.94 


1.99 


0.05 






1.17 


— 331 ' 


161 


“ ‘ 337 ‘ ™ 


' 15354 1 




HMMiWbitl 


5.44 


5.41 


5.35 


5.36 


6.13 


Asik 




2.20 


2.21 


2.16 


3355 ~~ 


0.11 


mwrcnrni 


1.11 


115 


335 


3315 


— 535 — 




grnirorsi 


2.24 


2.25 


2.18 


3315 


0.13 



According to Table 1, correct author of each article is successfully determined (de- 
picted as bold bpc values at each row) yielding an accuracy rate of 16/16=100%. 

To observe the recency effect, we used the identification margin measure (last col- 
umn in Table 1), which we define as the absolute difference between the correct au- 
thor’s bpc value and the value nearest to it at each row. In Table 1, as the size of the 
compressed (disputed) text becomes larger (years 2002 and 2003), the identification 
margin gets narrower. This means that the compression performance is highly af- 
fected by the size factor. In order to remove this size effect, we equalized the file sizes 
to the lowest file size in the test set: -0.25MB for the disputed texts and -1.5MB for 
the training texts and repeated the above experiment, with results given in Table 2. 



Table 2. Author recognition on sample Turkish texts with PPMD+ using equal file sizes 





Training text 


Birand 19XX 2000 


Altan 19XX 2000 


Coiasan 19XX 2000 


Asik 19XX 2000 


Identification 


Compresses' 


^size(bytes) 


(1.5MB) 


(1.5MB) 


(1.5MB) 


(1.5MB) 


margin 


| text (approx. size)''''~~-^ | 


Compression rate (bpc) 


Compression rate (bpc) 


Compression rate (bpc) 


Compression rate (bpc) 


(bpc) 




2001 (0.25MB) 


1.98 


2.42 


2.26 


2.20 


0.22 


Birand 


2002 (0.25MB) 


“ 1353 " 


335 


335 " 


" 3353 


535 


2003 (0.25MB) 


“ 1354 “ 


1.31 


" 333 " 


333 ~ 


5355 




2004 (0.25MB) 


1.63 


2.28 


2.09 


2.01 


0.08 






1.17 


1.31 


1.19 


2.17 


6.61 


Altan 


2002 (0.25MB) 


1 3353 ’ 


335 


“ 335 


335 


5353 


2003 (0.25MB) 


2.25 


2.31 


2.27 


2.11 


0.04 




2004 (0.25MB) 


2.28 


2.34 


2.29 


2.15 


0.05 




2001 (0.25MB) 


1.18 


1.46 


2.04 


1.66 


6.61 


Coiasan 


2002 (0.25MB) 


2.19 


2.40 


" ‘ 2.07 


2.08 


0.01 


2003 (0.25MB) 


2.19 


2.40 


3355 


2.09 


0.01 




UiMWMMhH 


333 1 


333 


335 


335* 


5355 




2001 (0.25MB) 


1.44 


1.61 


1.41 


2.20 


6.11 


Asik 


2002 (0.25MB) 


2.43 


2.59 


2.43 


2.22 


0.21 


Hil.kMmb.il 


335 


33? 


335 


3355 


535 




Hl.3M.mb.31 


333 ' 


335 


337 


3357 


535 



*: Slightly less than 2.10 bpc. 
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We calculate the accuracy rate for Table 2 as 11/16=68.75%. The difference in the 
accuracy rates in Table 1 and Table 2 (as 100% and 68.75%, respectively) reveals the 
size effect: In case we have limited training text and/or disputed text sizes, the per- 
formance of the author recognition attempts decrease considerably. 

Results in Table 2, where the size factor is neutralized, show that (except for Al- 
tan’s case where recognition fails) as the text in question becomes less recent to the 
training text, the identification margin gets narrower. This feature, which is valid for 
all of the four authors, can be attributed to the recency effect. 



3.2 Recency Effect 



Table 3. The effect of recency of the training text on Birand’s articles - full sizes 



Filename 


Size 

(KB) 


| Training text [ 


Birand 19XX 


Birand 2000 


Birand 2001 


Birand 2002 


Birand 2003 


Birand 2004 


bpc 


bpc 


bpc 


bpc 


bpc 


bpc 


Birand 19XX 


1,036 




<— 1.44 


1.43 


1.45 


1.46 


1.44 


Birand 2000 


646 


2.17 — ► 




2.01 


1.95 


1.97 


2.02 


Birand 2001 


267 


2.34 


1.96 — H 




<— 2.01 


2.03 


2.10 


Birand 2002 


1,120 


1.99 


1.85 


1.89 — ► 




<— 1.79 


1.85 


Birand 2003 


1,125 


1.98 


1.84 


1.88 


1.77 — ► 




<— 1.82 


Birand_2004 


444 


■■SEMI 


1.88 


1.94 


1.78 


1 '- 7( ' ~H 





Table 3 shows the effect of recency of the training text. While compressing each 
year’s articles with PPMD+, the articles except for that year’s articles are used as the 
training text. It makes no sense to compress a particular year’s article by using this 
text itself as the training text. This is why the compressed article at each row is de- 
noted by a shaded cell in the table. The arrows in the table indicate the nearest 
neighbors (upper and lower) of the compressed text on each row. What we expect out 
of this experiment is that, each year’s article is best compressed with training text be- 
ing the nearest neighbor. In four out of six cases, our expectation is met. Whereas in 
two cases, it is not. This anomaly may be due to the different size of texts available 
for each year. To see if this was the reason, we normalized the text sizes for each year 
and repeated the measurements. For that, we chopped each year’s text size to year 
2004’s size for that particular author. We also excluded the years having texts less 
than year 2004’s size. The reduction has been done beginning from the end of each 
year’s texts, because newer texts are nearer to year 2004’s articles and hence would 
compress better. The results are presented between Tables 4 through 7: 



Table 4. The effect of recency of the training text on Birand’s articles - equal sizes 



Filename 


Size 

(KB) 


| Training text [ 


Birand 19XX 


Birand 2000 


Birand 2002 


Birand 2003 


Birand 2004 


bpc 


bpc 


bpc 


bpc 


bpc 


Birand 19XX 


— 




<— 1.49 


1.51 


1.51 


1.51 


Birand 2000 




2.20 — ► 


WMMMMM 


<— 2.01 


2.02 


2.02 


Birand 2002 




2.13 


1.92 —*■ 




< — 1.89 


1.90 


Birand 2003 




2.13 


1.93 


1.89 — ► 




■*— 1.87 


Birand 2004 




2.09 


1.90 


1.86 


1 184 -»! 





As seen on Table 4, removing the size effect, we can better realize that recency 
plays a significant role in the performance of the PPMD+ algorithm. For each year’s 
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article of Birand, with training text being the nearest neighbors, the best performances 
are obtained. Tables 5 through 7 show similar results for the authors, as well. 



Table 5. The effect of recency of the training text on Altan’s articles - equal sizes 



Filename 


Size 

(KB) 


| Training text f 


Altan 19XX 


Altan 2000 


Altan 2002 


Altan 2003 


Altan 2004 


bpc 


bpc 


bpc 


bpc 


bpc 


Altan 19XX 


436 


'wrnMMm. 


<— 1.58 


1.60 


1.61 


1.61 


Altan 2000 


436 


2.31 -* 




< — 2.08 


2.12 


2.12 


Altan 2002 


436 


2.31 


2.08 — ► 




<— 2.08 


2.09 


Altan 2003 


436 


2.26 


2.06 


2.02 — ► 






Altan 2004 


436 


2.29 


2.09 


2.07 


2.07 — * 





For Altan’s articles, whenever the nearest neighbor is the training text , the best 
compression performance is obtained (Table 5). In three out of the five cases, previ- 
ous articles and two out of five cases the following year’s articles as training texts 
give the best compression. In all cases, the best results are obtained via the most re- 
cent training texts. Table 6 shows that a similar tendency exists for Colasan’s articles. 



Table 6. The effect of recency of the training text on Colasan’s articles - equal sizes 



Filename 


Size 

(KB) 


Training text 


Colasan 19XX 


Colasan 2000 


Colasan 2002 


Colasan 2003 


Colasan 2004 


bpc 


bpc 


bpc 


bpc 


bpc 


Colasan 19XX 






<— 1.85 


1.86 


1.86 


1.86 


Colasan 2000 




2.24 -*■ 




<— 2.03 


2.04 


2.05 


Colasan 2002 




2.25 


BHiK£E£l 


WMSMSSSSSSM 


■*— 2.01 


2.02 


Colasan 2003 




2.26 


2.05 








Colasan 2004 




2.26 


2.05 


2.02 


2.01 — *■ 


'WMMMZWa 



According to Table 7, in three out of four cases, the most recent texts as training 
text provide the best compression. In one case only (the last row of Table 7), a non- 
most recent text yields slightly better compression. This can be due to the idiosyn- 
crasy of this specific author as was explained in section 2.2. 

Considering the whole recency experiments, the most recent training text com- 
presses consistently better, but only with a small margin. 



Table 7. The effect of recency of the training text on Asik’s articles - equal sizes 



Filename 


Size 

(KB) 


| Training text j 


Asik 2000 


Asik 2002 


Asik 2003 


Asik 2004 


bpc 


bpc 


bpc 


bpc 


Asik 2000 


456 




2.16 


2.17 


2.16 


Asik 2002 


456 


2.19 — ► 




<— 2.17 


2.20 


Asik 2003 


456 


2.11 


2.07 — ► 




<— 2.10 


Asik_2004 


456 


2.13 


2.14 


2.14 — ► 





The recency effect is not only a matter of same topics being discussed in the arti- 
cles, which are close in time, but also the variation in the author’s writing style. For 
the former factor, since the articles under consideration are the texts from daily news- 
papers, it is quite expected that they are highly effected by the current politic, eco- 
nomic and social climate. For the latter factor, the authors do develop their writing 
style with time: There are some “hot” words, sayings changing with time and authors 
get effected by those trends while writing. This effect has already been investigated 
on the writings of two famous Turkish writers: Cetin Altan and Yasar Kemal by Can 
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and Patton [15]. Results of this study support with statistical evidence that the change 
of writing style for these authors change with time. 



3.3 Size Effect 

We have investigated the effect of training text size on the performance of the com- 
pression with PPMD+. For that, we have used the articles of each author written be- 
fore year 2004 as the training text to compress his articles written in 2004. In order to 
observe the effect of gradually changing the training text size, we measured the com- 
pression performance of the PPMD+ algorithm under ten different training text sizes. 
By employing training texts of size IK, 5K, 10K, 50K, 100K, 250K, 500K, 1000K, 
2000K and 4000K, we compressed each author’s year 2004 articles. With results of 
each run are given in the charts below: 




2.50 
~ 2.00 
St 1-50 
_ 1.00 







5K 




10K 50K 100K 250K 500K 1000K 2000K 4000K 



Training text size 



Fig. 2. Effect of training text size on Birand’s articles 



As seen from Figure 2, increasing the training size gradually improves the compres- 
sion performance of Birand’s articles, as expected. The rate of improvement, calcu- 
lated as the difference percentage between the largest (2.08 bpc with IK training text ) 
and the smallest (1.70 bpc with 4000K training text) bpc values, is 18.25% for Bi- 
rand’s case. This represents a considerable performance improvement. 

We also plotted the compression performance variation with changing training text 
size for Altan’s articles. This plot is shown in Figure 3. According to Figure 3, bpc 
rates for Altan’s articles improves with increasing training text size as was the case 
with Birand’s articles. The rate of improvement with Altan’s articles is 19.30%. 





0 

(« H ’y 2.00 - 

Q, 

a 2 e 1.50 
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Training text size 



Fig. 3. Effect of training text size on Altan’s articles 



The compression rate vs. training text size charts for Colasan and Asik (Figure 4 
and Figure 5) have the same trend as that of the other journalists, i.e. using larger 
training text helps improve the compression rate. The improvement rates are 19.60% 
and 18.40% for Colasan and Asik, respectively. 
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Fig. 4. Effect of training text size on Colasan’s articles 
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Fig. 5. Effect of training text size on Asik’s articles 



Above experiments clearly indicate that employing larger training text yields better 
compression. Evaluating the training text size experiments for all four authors, we 
note that the performance gain from the smallest size to the largest size is very consis- 
tent at around 19%. Addressing the author recognition problem, this property can be 
used to improve the performance of the recognition studies implemented with com- 
pression tools: Selecting larger training texts increase the success of determining the 
right author. 



4 Conclusion 

Using the trained PPM algorithm for author recognition was introduced in [11] for 
English texts. In this work, we have applied this approach on Turkish texts. With lar- 
ger texts the approach resulted in 100% accuracy. On the other hand, when the train- 
ing text and/or disputed text sizes are limited, classification accuracy drops signifi- 
cantly to 68.75%. 

We have also investigated the effect of recency and size of training text on the suc- 
cess of the author recognition. What we obtained is that, the more recent and the lar- 
ger the training text is, the better gets the compression rate. This, presumably leads to 
higher possibility of correctly recognizing the right author. 

Comparing the effects of recency and size of the training text, we see that as com- 
pared to recency, size plays a more dominant role on the performance of author rec- 
ognition. 

Further tests with more distant writings, together with larger training text sizes will 
be possible as more and more texts are available online. Experiments on the data set 
used in this study can further be carried on with Support Vector Machine (SVM) clas- 
sifier to compare the performances of the two. PPM has already reached and even 
outperformed the SVM performance on English texts [10]. The same experiments can 
be carried out on our Turkish data set. 
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Abstract. Recently, there has been a great deal of interest in intelligent control 
of robotic manipulators. Artificial neural network (ANN) is a widely used 
intelligent technique on this way. Using ANN, these controllers leam about the 
systems to be online controlled by them. In this paper, a neural network 
controller was designed using traditional generalized predictive control 
algorithm (GPC). The GPC algorithm, which belongs to a class of digital 
control methods and known as Model Based Predictive Control, require long 
computational time and can result in a poor control perfonnance in robot 
control. Therefore, to reduce the process time, in other words, to avoid from the 
highly mathematical computational structure of GPC, a neural network was 
designed for a 3-Joint robot. The performance of the designed control system 
was shown to be successful using the simulation software, which includes the 
dynamics and kinematics of the robot model. 



1 Introduction 

Robotic manipulators have become increasingly important in the field of flexible 
automation. To get an autonomous robot controller to be useful, it must operate in real 
time, pickup novel payloads, and be expandable to accommodate many joints. One of 
the indispensable capabilities for versatile applications of the mechanical robotic 
manipulators is high speed and high precision trajectory [1,2]. Recently, artificial 
neural networks have been successfully used to model linear and nonlinear systems. 
In process control applications the controller makes explicit use of the neural network 
based process model to determine the k-step ahead prediction of the process outputs 
[3]. There has also been much interest in learning the form of simple models of 
networks of neurons. The overall complexity of robot control problems, and ideal of a 
truly general robotic system, have led to much discussion on the use of artificial 
neural networks to leam the characteristics of the robot system, rather than having to 
specify explicit robot system models. 

Artificial neural networks-based predictive control has been studied in several 
papers [1,4-6]. Arahal et al. [6] have presented a study on neural identification 
applied to predictive control of a solar plant. In their paper, an application of general 
identification methodology to obtain neural network predictors for use in a nonlinear 
predictive control scheme is derived from the generalized predictive controller 
structure. Gupta and Sinha [1] have designed an intelligent control system using PD 
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controller and artificial neural network. They have found the algorithm successful. On 
the other hand, they mentioned, as “the learning system used does not suffer from the 
restriction of repetitive task since the training information is in the form of the system 
transfer characteristics at individual points in the state space, and is not explicitly 
associated with the trained trajectories. Obviously, it would be difficult to train a 
system to generate correct control outputs for every possible control objective.” 
Similarly, the same event as stated above is observed in this study. 

In this paper, a three-joint robotic manipulator was controlled to prepare data by 
using Generalized Predictive Control algorithm [7-9]. A simulation software, which 
includes all the dynamics and kinematics equations of the defined robot model, was 
used. The simulation software was previously developed in a study involving the 
attitude of robot depending on the different vision-based trajectories [10]. Robotic 
manipulator was controlled for different trajectories with GPC, and during these 
simulations a training set was prepared for neural network. The aim of this paper is to 
reduce the process time due to the on-line working feature of neural networks instead 
of highly mathematical structure of GPC algorithm. 

2 The Dynamic and Kinematics Model of the Robot 

A three-joint robot model shown in Figure 1 was used in the simulation. The 
simulation software includes dynamics and kinematics equations for the given robot 
model. As it is well known, the dynamics of a robot arm [11] can be expressed by the 
following general set of equations given in Eq. (1): 



X d kj (?)?; + X C ijk (?) + fk (?) = T k 0) 

j hj 



k = 1, . . . n, i = 1, . . ., n, j = 1, . . ., n and where; 



y.j 

q 

n 

n 

d kj 

c ijk 

fk 



jth generalized coordinate 

generalized coordinate vector 
kth generalized force 
number of joints 
inertial coefficients 

centrifugal and Coriolis coefficients 
loading item due to gravity 



The parameters of the robotic manipulator [12] used in the simulations were given 
in Table 1 . The simulation software was developed in order to observe the movements 
of robot model visually using Open GL program, and a view of the robot model was 
given in Fig. 1. So, the simulation software gives the result of speed and position at 
the end of control process to take information about the obtained control system 
accuracy. The robot control and neural networks software was coded using Delphi 
Programming Language. 
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In this study, the robotic manipulator was modeled using its dynamic equations. 
However, the disturbances, which exist in a real robot implementation, were also 
expressed in the dynamic model equations with a coefficient. The aim of this is to see 
the affects of some disturbances in real life such as rusting, friction and other effects. 

Table 1 . The parameters of the used robotic manipulator 



I 


Joint-1 


Joint-2 


Joint-3 


Units 


mi 


13.1339 


10.3320 


6.4443 


kg- 


3i 


0.1588 


0.445 


0.10 


m. 




Jt 1 2 


0.0 


/r/2 


Radian 


* 

Xi 


-0.0493 


-0.1618 


0.0 


m. 


* 

z i 


0.0 


0.0 


0.2718 


m. 


k in 


5.6064 


3.929 


82.0644 


nrxlCT 3 


ki22 


8.9196 


47.8064 


81.9353 


nrxlCT 3 


k i33 


13.2387 


45.4838 


1.400 


nrxlCT 3 



In Table 1 ; 

mi The weight of arm ‘i’ 

aj Shift distance of the ith coordinate system along the rotating axis of 
Cf| The orientation of Zj axis according to Zj_j axis 
* * The coordinates of gravity center of arm ‘i’ according to ith coordinate 

Xi,Zj 

frame 

k;;j Jiration radius ( I -- = m . k ) 

v 1J 1 1JJ 7 



In Figure 1 the kinematics structure of the robotic manipulator was given. This 
figure gives us information about the cartesian position of end effector according to 
joint angles. This figure is also named as direct kinematics solution. On the other 
hand, it is necessary to compute the joint angles according to a given cartesian 
coordinates especially for a vision based applications [13]. This solution is known as 
inverse kinematics solution. The aim of this paper is to obtain a neural network 
control based on training with obtained data from GPC algorithm. Therefore, inverse 
kinematics problem is not detailed here. 
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Fig. 1 . A view from the 3-D simulated robotic manipulator 

3 Design of the Neural Network Controller 

Artificial Neural Network (ANN) is a parallel-distributed information processing 
system. The main idea of the ANN approach resembles the human brain functioning 
[14]. It has a quicker response and higher performance than a sequential digital 
computer. The learning feature of the neural network is used in this study. 

In this section, it is firstly aimed to present the design of the neural network, which 
is working online to produce applied torque values for each joint in the system. A 
Back-Propagation neural network that uses gradient descent error learning algorithm 
with sigmoid activation function was used to model GPC algorithm. The data 
prepared during the implementation of traditional GPC algorithm was used to train 
neural networks. The training process has been implemented just for some areas in the 
work volume of robot. To design a controller generalized for whole area of work 
volume of robot is a very time consuming study due to the training difficulty. 

3.1 Data Preparation 

To train the neural network, a training set has been prepared by using the results of 
implementation of GPC. The manipulator has been controlled for different trajectories 
to generate the data for the training set. The designed neural network has 12 inputs 
and 3 outputs. To obtain the torque value at time “t” as an output, the torque values at 
time (t-1), (t-2), and y and y-references at time (t-1) are used in input stage as 12 
elements. These data has been generated using GPC controller for different 
trajectories. These trajectories have been selected uniformly to model the GPC 
algorithm. In the preparation of training set, care has also been taken of payload 
variations. Payload variations are taken between 0 gram and 10000 gram. Due to the 
characteristic feature of sigmoid activation function used in the training of 
backpropagation neural network model, all data in training set has been normalized 
between “0” and “1”. 
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3.2 Training of the Neural Network 

In the off-line training of the backpropagation neural network, 13000 input and output 
vector sets are generated using simulation software. 12000 of these are used as 
learning set, and others are used in test. Learning rate and the momentum rate are 
experimentally chosen as 0.3 and 0.85, respectively. Error at the end of the learning is 
0.007895. The error is computed based on mean square error (MSE) [15]. 



U3(M) , q ) 

Ui (t-2) _ - 



Fig. 2. The topology ot the designed neural network 



The training process has been completed approximately in 7.500.000 iterations. 
After the off-line neural network training is finished, the neural network, which works 
online, is coded with obtained synaptic weights as seen in Figure 2. The neural 
network includes 20 neurons in the hidden layer, and it has been tried to obtain the 
neural network controller with the least number of perceptron in the hidden layer. An 
example of the obtained torque curve has been given in Figure 3. 
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4 Simulation Results 



The designed neural network controller, which is the obtained by modeling GPC 
controller, is implemented by using simulation software for a trajectory. The block 
diagram of the neural network control implementation has been shown in Fig. 4. The 
results have been compared with the results obtained from GPC to have information 
about the success of the neural network controller. The simulation results have been 
given for both traditional GPC controller and the designed neural network controller 
in Fig. 5 and Fig. 6 respectively. The same trajectory has been used for both 
controllers to observe the accuracy of the ANN controller compared to GPC 
controller. In these simulations, cubic trajectory planning is used as a path-planning 
algorithm. Its equation, which has been used in the computation of reference position 
and speed values, is given in eq. (2) below. The speed equation can be obtained from 
this equation by a derivation process. A second derivation process will give 
acceleration equation. 



m = *i0 +TT(*if -^o)t 2 -TT^f -^o)t 3 



(i=l, 2, 3) (2) 



where 



t f : Total simulation time, 

9 lf : The final angular position for ith joint, 
9 l0 = The starting position of ith joint, 
t = Time, 

n = The number of joints. 
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Fig. 4. The block diagram of the trajectory control implemented by using the designed ANN 
controller 

The result of GPC controller is found successful. On the other hand, the error is too 
much at the starting points in speed curves. But, the GPC algorithm has taken the 
system under control a few iterations later. The same error has been also observed in 
ANN controller results in the same way. 

The strongest way of GPC is the acceptances on the future control trajectories. 
However, it has a complex computational structure and needs too much computation 
time. On the other hand, a neural network can give results rather quickly. 

In the simulation studies, it is observed that the training process is too much 
important for the accuracy of the obtained ANN control system. Generalization is also 
made through the reproduction of a situation that was absent from the training set. On 
the other hand, the system is more successful for the trajectory, which is in training 
set than one, which is not in training set. In Fig. 6, the result is taken from the 
trajectory that is not in training set, to demonstrate the generalization of the designed 
ANN controller. 









HI. angle (rad) n - * n 2> I™*) I- angle (rad) 
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Fig. 5. The results of the GPC controller. Joint position and references are shown in (a), (b), 
and (c); Joint speed and references are shown in (d), (e), and (f) 
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Joint 1 





(c) (0 

Fig. 6. The results of the neural network controller. Joint position and references are shown in 
(a), (b), and (c). Joint speed and references are shown in (d), (e), and (f) 



5 Conclusions 

Through the simulation studies, it was found that robotic manipulators can be 
controlled by learning. In this study, GPC algorithm was modeled by using artificial 
neural networks. The performance of the neural network controller, used instead of 
GPC, was found successful. It was observed that, the prepared training set and the 
training sensitivity of the neural network much affects the system performance. 
However, it is also difficult to imagine a useful non-repetitive task that truly involves 
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making random motions spanning the entire control space of the mechanical system. 
This results an intelligent robot concept as being one is trained for a certain class of 
operations, rather than one trained for virtually all-possible applications. 

The most important idea of this study is that Generalized Predictive Control 
algorithm has highly mathematical computations; where as the neural networks have 
the capability of fast and on-line working feature. Modeling GPC by neural networks 
can have an important role in real time systems in the viewpoint of reducing process 
time. In the future studies, the developed control algorithm can be examined with a 
real robot to observe its real time accuracy, and to compare the results those of 
simulations. 
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Abstract. A novel Ant Colony Optimization (ACO) strategy with an external 
memory containing horizontal and vertical trunks from previously promising 
paths is introduced for the solution of wall-following robot problem. Ants con- 
struct their navigations by retrieving linear path segments, called trunks, from 
the external memory. Selection of trunks from lists of available candidates is 
made using a Greedy Randomized Adaptive Search Procedure (GRASP) in- 
stead of pure Greedy heuristic as used in traditional ACO algorithms. The pro- 
posed algorithm is tested for several arbitrary rectilinearly shaped room envi- 
ronments with random initial direction and position settings. It is 
experimentally shown that this novel approach leads to good navigations within 
reasonable computation times. 



1 Introduction 

Wall-following autonomous robot (WFAR) navigation is a well-known problem in 
the field of machine learning and different metaheuristics are used for its solution. 
The goal of a wall-following robot is to navigate along the walls of an environment at 
some fairly constant distance. The advantage of the wall-following domain is that it 
provides a simple test case to tell whether an autonomous robot is actually succeeding 
the given task or not. Genetic Programming (GP), Genetic Algorithms (GAs), Fuzzy 
Logic, and Neural Networks are commonly used methods to obtain wall-following 
behavior for an autonomous mobile robot. Mataric [1] has implemented a subsump- 
tion architecture in which one complex behavior is decomposed into many simple 
layers of increasingly more abstract behaviors such that a lover layer can overrule the 
decision of its overlaying layer. Mataric used the subsumption architecture for 
autonomous mobile robot control for the achievement of four different tasks: stroll- 
ing, collision avoidance, tracing convex boundaries, and tracing general boundary of a 
room. The main difficulty of subsumption architecture in writing a composition of 
task achieving behaviors that are able to solve a particular problem is the requirement 
of considerable programming skills and ingenuity. Koza [2] used Genetic Program- 
ming (GP) for the evolution of wall-following behavior. Tree-structured computer 
programs consisting of robot instructions and environmental parameters are evolved 
to achieve the wall-following behavior. Ross et al. [3] improve the work of Koza by 
adding automatically defined functions (ADF’s) in their GP implementation. ADF’s 
allow the representation of more complicated robot behavior and flexibility in pro- 
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gram development since problem-adapted instructions are developed by evolution. 
Braunstingl et al. [4] used Fuzzy Logic Controllers and GA’s for the navigation of an 
autonomous mobile robot. The perceptual information of the sensors are passed to the 
fuzzy system without modeling the walls or obstacles of an environment. The rule 
base of the fuzzy system is designed by manually and then a GA is used to find the 
optimum membership functions. 

Ant Colony Optimization (ACO) algorithm is one of the most popular metaheuris- 
tic approaches for the solution of complex optimization problems [5-7]. Ant algo- 
rithms have been inspired by the behavior of real ants. In particular, Argentine ants 
are capable of selecting the shortest path, among a set of alternative paths, from their 
nest to food sources [8]. Ants deposit a chemical trail (or pheromone trail) on the path 
that they follow, and this trail attracts other ants to take the path that has the highest 
pheromone concentration. Pheromone concentration on a path plays a key role in the 
communication medium between real ants. Since the first ants coming back to the 
nest are those that took the shortest path twice, more pheromone concentration is pre- 
sented on the shortest path. Over a finite period of time, this reinforcement process re- 
sults in the selection of the shortest pathway [9]. 

In this paper, an external memory supported ACO approach is presented for 
WFAR navigation problem. When classical ACO algorithm is applied to WFAR 
problem, two important observations that form the main inspiration behind the pro- 
posed approach are made: Firstly, a particular point inside the room may be simulta- 
neously on low- and high-quality paths. Flence, pheromone deposition on single nodes 
over 2D plane may be misleading, because pheromone updates due to low-quality 
paths will hide the visibility due to a smaller number of high-quality navigations. 
Secondly, one should incorporate knowledge from previous iterations to exploit ac- 
cumulated experience -based knowledge for the generation of correct decisions for fu- 
ture moves. The proposed approach overcomes these two difficulties by using a trunk- 
based path construction methodology, rather than the node-based approach, and using 
an external-memory of trunks, extracted from high-quality path of previous iterations, 
for knowledge accumulation. The details of implementation and algorithmic descrip- 
tion of the proposed approach are given in Section 3. 

This paper is organized as follows. A brief description of traditional ACO algo- 
rithms is given in Section 2. In Section 3, implementation details of the proposed ex- 
ternal memory-based ACO strategy are discussed. Experimental results are reported 
in Section 4. Section 5 concludes the paper. 



2 Traditional ACO Algorithms 

ACO metaheuristic can be applied to any discrete optimization problem where some 
solution construction mechanisms can be described [5]. The ants in ACO implement a 
greedy construction heuristic based on pheromone trails on the search space. The 
main task of each artificial ant in the simple ant colony optimization algorithm is sim- 
ply to find a shortest path between a pair of nodes on a graph on which the problem 
representation is suitably mapped. Each ant applies a step-by-step constructive deci- 
sion policy to construct problem’s solutions. 
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At each node of the graph, where the search space of the problem is mapped, local 
information maintained on the node itself and/or outgoing arcs is used in a stochastic 
way to decide the next node to move. The decision rule of an ant k located in node i 
uses the pheromone trails Ty to compute the probability of choosing node y€ N as the 
next node to move to, where N; is the set of one-step neighbors of node i. The prob- 
ability of P.j is calculated as follows [6]: 

P *-\ Ti J’ i f JeN ‘ (1) 

0, otherwise 



At the beginning of the search process, a small amount of pheromone To is assigned 
to all the arcs on the graph. When all ants built their solutions, they deposit a constant 
amount Axy of pheromone information on the arcs they used. Consider an ant that at 
time t moves from node i to node j, it will change the pheromone values Ty as 
follows [6]: 

al l an ts (2) 

0 + 1 ) <- *,j (t ) + A T v , where Af.. <— X A 

k = 1 



A r* 



Q, if the ant used this line 
0 , otherwise 



( 3 ) 



The constant Q is computed according to the quality of the ant’s solution. In this 
way, an ant using the arc connecting node i to node j increases the probability that 
ants will use the same arc in the future. Initially given pheromone trail To avoids a 
quick convergence of all the ants towards a sub-optimal path. An exploration mecha- 
nism called pheromone evaporation process is added by decreasing pheromone con- 
centrations in an exponential way, as given in Equation 4, at each iteration of the 
algorithm [6, 9]. 

T <r p)T , p (0,1] (4) 



3 Use of Trunk-Based External Memory in ACO 

The room environment is represented as a two dimensional NxM grid, where N is the 
number of rows and M is the number of columns. Single integer codes are used to 
identify walls, restricted minimum safe distance cells, obstacles, and free cells. 

The ideas forming the basis of the proposed approach are developed based on the 
results of two-phase studies. In the first phase, a simple ACO algorithm is applied for 
the solution of WFAR problem without any external memory. In its practical imple- 
mentation, ants construct their navigations by moving cell-to-cell based on the 
pheromone concentrations within each cell and ants are also allowed to deposit 
pheromone inside cells that are visited during their navigations. The quality of the ob- 
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tained solution was not sufficiently high due to two reasons: first, pheromone concen- 
trations within individual cells are not informative enough because a particular cell 
may be simultaneously on both a high-quality and a low-quality path. Secondly, node 
visibility causes confusion in the calculation of node selection probabilities. As a re- 
sult of these two observations that are supported by several experimental evaluations 
[10], two important conclusions are generated. First, an ant’s decision on the selection 
of the next node to move must be effected from its previous selections, and secondly, 
if the moves to be taken can be made along linear trunks, interpretation of pheromone 
concentration will be much more meaningful because it will remove the first confu- 
sion mentioned above. These two conclusions form the main inspirations behind the 
proposed approach and the details of which described below. 

The algorithm is started with an initialization process. Initial positions of ants are 
randomly set and a library of variable size linear path segments, called trunks, is gen- 
erated. These trunks are used in the path construction phase of artificial ants. Trunks 
in the external memory are stored in a linked list structure. Each trunk in the trunk 
memory has a trunk id, its associated start and end coordinates, a direction, a trunk 
fitness, a path fitness, a flag for best ant’s use, and in use information. In order to 
avoid the selection of the same path segments, ants are not allowed to use Greedy 
heuristics in their construction phase as used in traditional ACO algorithms. Fitness 
value of a trunk is calculated and recorded after its creation, which shows the quality 
of the trunk in its environment. The trunk id is used to follow the trunks that are used 
by each ant. At the end of each iteration, the fitness values of the constructed paths 
are calculated and a path’s fitness value is also recorded with all the related trunks on 
this path. If more than one ant uses same trunk, then the highest one’ path fitness is 
kept with this trunk. If the path fitness of a trunk is less than the average path fitness 
of the current iteration, then it is replaced by another randomly generated trunk. 

At the beginning of the algorithm, initial random positions are assigned to each ant. 
Each ant uses its initially assigned position throughout all iterations. The proposed 
ACO algorithm is iterated until the max-iteration has been reached. At the beginning 
of all iterations, the list of used trunks for each ant is initialized in order to provide di- 
versification in the construction of new paths in the coming iterations. 

In order to construct a path, an ant first finds the set of memory elements having a 
start position attribute equal to the ant’s initial position. Path construction consists of 
two repeatedly iterated tasks until it is terminated. Firstly, based on the latest end 
point coordinates on which the robot resides, the memory is searched for trunks hav- 
ing appropriate starting point coordinates. If there are such trunks available, then one 
of them is selected using the Greedy Randomized Adaptive Search Procedure 
(GRASP) method. If no such trunks are available, then y new ones are generated such 
that their starting position coincides with the robot’s latest coordinates. Consequently, 
one element of the newly created list of trunks is selected and it is added to the path 
under construction. The robot’s position is updated after each trunk addition. Sec- 
ondly, the memory is updated by replacing all unused trunks so far by with a best sub- 
set of the newly created ones. In case that there are no unused trunks, the subset of 
newly created elite trunks are inserted at the end of the trunk library. The structure of 
the trunk memory and the ant’s path construction phase is illustrated in Figure 1. 
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Ant’s Path Construction External Trunk Memory 




■ Ant start position 



Fig. 1. Ant’s path construction phase and structure of the trunk memory 

Ants continue to construct new paths until no improvement is achieved in their 
navigations. This criterion is applied by using Euclidean distance measurement: path 
construction process is stopped when the length of the average Euclidean distance of 
the last three trunks is less than a threshold P; this means that the ant is stuck at 
around a localized region, hence its path construction is terminated. General descrip- 
tion of the proposed AGO algorithm is given below: 

Step 1. Initialization 

1.1 Initialize the trunk library with a number of randomly generated variable- 
size trunks 

1.2 Set initial random position of each ant 
Step 2. Loop until max-iteration has been reached 
Step 3. ACO Algorithm 

3.1 Initialize the used trunk list of each ant 

3.2 Loop for each ant in the population 

3.2.1 Search the memory for trunks having an appropriate start position 

3.2.2 If selected number of trunks < threshold y then 

- Generate y new random trunks with appropriate starting positions 
and directions 

- Select one trunk from this list by using GRASP algorithm 
Else 

- Select one trunk from the memory by using GRASP algorithm 

3.2.3 If termination criteria met for the current ant, then go to step 3.2 
Else go to step 3.2.1 

3.3 Evaluate Ants’ fitness, and set path fitness’ of each trunk 

3.4 Update best ant’s navigation 

3.4 Update trunks that have less path fitness than the average path fitness 
Step 4. Go to Step 2 





46 Mehtap Kose and Adnan Acan 



4 Results 

The robot is programmed to move only forward direction according to its current po- 
sition. Therefore, it is enough to use three sensors that detect any obstacle or wall 
cells located in front, left, and right of it. The experiments involve a robot moving in a 
world that consists of a 16x16 cells map. In the environment map, the outer cells rep- 
resent the walls of the room and the robot cannot occupy them. The inner cells adja- 
cent to the walls represent the minimum safe distance to the walls and the fitness of 
the robot is penalized when these cells are occupied during the navigation. Finally, the 
inner cells that are adjacent to the minimum safe distance cells signify the desired 
cells where the robot is picking up the highest fitness points. 




Room 1 




Room 2 




Room 3 




Room 4 



Room 5 



Fig. 2. Five different room types used in the experiments 



For each run of the proposed ACO algorithm for wall-following behavior, ants are 
randomly placed within the allowable cells. The experiments were run independently 
from each other in five different rooms, with none to four protrusions from the walls. 
The shapes of the rooms are illustrated in Figure 2. 

The maximum number of iterations for each ACO run is set to 300. The ants’ 
population is kept as constant and the number of ants is selected as 100. The termina- 
tion of an ant’s navigation is controlled by checking the average of the Euclidean dis- 
tance to the last three end positions. The minimum traveled distance required is taken 
as 3 on the average. Throughout the construction of ants’ navigation, each ant selects 
its next position from the trunk memory by using a GRASP algorithm. The algorithm 
is stopped when 300 iterations have been passed. All these parameters are summa- 
rized in Table 1. The hit histograms that provide an overall picture of the number of 
hits to the desired cells through the best ant’s navigation in each room are presented in 
Figure 3. These results are obtained from twelve consecutive runs of the algorithm. 
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Table 1. List of used parameters in ACO algorithm 



Objective 


To evolve a wall-following robot behavior for rooms hav- 
ing increasing complexity. 


Fitness Cases 


(500xnumber of traveled desired cells) 

- (lOxnumber of crashes) - total number of visited cells 

Room 1: number of desired cells are 60 

Room 2: number of desired cells are 64 

Room 3: number of desired cells are 68 

Room 4: number of desired cells are 72 

Room 5: number of desired cells are 76 


Selection 


GRASP having candidate list length of y at least 


Flits 


The number of times the robot occupies the desired cells. 
The robot is penalized if it occupies the cell already vis- 
ited before, or minimum safe distance cell. 


Parameters 


Number of ants: 100, max-iteration: 300 
Greedy a =0.5, y=7,(3 = 3. 


Success predicate 


Room 1: 60 hits Room 4: 72 hits 

Room 2: 64 hits Room 5: 76 hits 

Room 3: 68 hits 



The horizontal axis of each histogram refers to the number of hits in the corre- 
sponding room. The vertical axis is the frequency of runs for which the best ant’s 
navigation achieved the specified number of hits during the execution. The details of 
the successful navigations in five different rooms are summarized in Table 2. Since 
the program parameters and the desired number of cells are given in Table 1 ; these in- 
formation are not included in Table 2. 



Table 2. Characteristic of best runs in each room 



Rooms 


Iteration 

found 


Visited 
desired cells 


No of 
crashes 


Visited 

cells 


Rooml 


74 


60 


0 


60 


Room2 


32 


64 


0 


64 


Room3 


92 


68 


2 


72 


Room4 


113 


72 


6 


80 


Room5 


244 


76 


40 


123 



The required wall-following behavior in this study is organized as follows. The 
autonomous mobile robot is forced to visit all of the desired cells in the room envi- 
ronment without occupying minimum safe distance cells, wall cells, and previously 
already visited cells. The robot can only distinguish the desired cells by using sensor 
information. The robot navigation is penalized when it tries to occupy wall cells, or 
minimum safe distance cells, or already visited cells in its fitness assignment. The al- 
ready visited nodes are distinguished from the list of used trunks of an ant. 
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Fig. 3. Hits histogram, (a) Room 1, (b) Room 2, (c) Room 3, (d) Room 4, (e) Room 5 

The hit histograms of the proposed algorithm demonstrate that the proposed ACO 
algorithm does not get stuck at locally optimal navigations in the wall-following do- 
main and it visits all of the desired cells for all rooms. None of the previously per- 
formed studies are as good as our implementation. Nevertheless, the overall algorithm 
performances in room 3 are comparable with the results obtained by [3]. Our proposed 
algorithm found 8 successful solutions out of 12 trials where 6 successful solutions 
were reported out of 61 trails in [3], Moreover, even we used similar type of environ- 
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ments in our simulations with the ones used in [11], the problem domain they studied 
is different. They aimed to evolve one GP program to be used in all rooms without 
considering the measures of already visited cells and smallest number of cell occu- 
pation. 



5 Conclusions 

A novel AGO strategy for the WFAR navigation problem is introduced. The proposed 
approach uses an external trunk library to allow ants to exploit their accumulated past 
experience and to remove the confusions caused by node-based pheromone deposition 
strategies. 

The algorithm is tested on five different room environment where all of the desired 
cells are visited with smallest number of occupation on inner room redundant cells. 
As illustrated in Table 2, the number of crashes for the first two rooms is zero while 
only the desired cells are visited during the navigations. Considering rooms 3 and 4, 
all the desired cells are visited with very few number of crashes and quite small num- 
ber of redundant cell occupations. Room 5 is the most complicated problem environ- 
ment where there are four protrusions, still all the desired cells are visted, however, 
the number of crashes and the number of occupied redundant cells are comparably 
high. Higher number of crashes and redundant cell occupations for Room 5 can be 
caused by weakness of our stopping criteria. However, it can be noted that, the pro- 
posed approach did not get stuck at locally optimal solutions for any of the tested 
problem instances. 

This approach needs futher investigation form the following points: 

• Application in dynamic problem environments, 

• Adaptive stopping criteria, 

• Using separate trunk libraries on individual ants. 
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Abstract. This paper investigates the issues of data properties with various 
local, piecewise, global, mixture of experts (ME) and boundary-optimized 
classifiers in medical decision making cases. A local k-nearest neighbor (k- 
NN), piecewise decision tree C4.5 and CART algorithms, global multilayer 
perceptron (MLP), mixture of experts (ME) algorithm based on nonnalized 
radial basis function (RBF) net and boundary-optimized support vector 
machines (SVM) algorithm are applied to three cases with different data sizes: 
A stroke risk factors discrimination case with a small data size N, an antenatal 
hypoxia discrimination case with a medium data size N and an intranatal 
hypoxia monitoring case with a reasonably large data size individual 
classification cases. Nonnalized RBF, MLP classifiers give good results in the 
studied decision making cases. The parameter setting of SVM is adjustable to 
various receiver operating characteristics (ROC). 



1 Introduction 

In the past decade, there has been great activity in machine learning and intelligent 
techniques for the development of models, applications etc. These ideas have further 
led to reinterpretation of existing network structures; proposals of new net structures; 
and novel learning algorithms based on optimization techniques, principles, and 
criteria from these areas. In this field, there has also been a special interest in the 
development of classification, clustering, regression, prediction, etc. of medical data 
analysis problems. Remarkable efforts can be observed in the applications such as 
ECG, EEG, gynecology and obstetric, neurology, etc. with new ideas such as neural 
and fuzzy combinations [1], mixture of experts, ensembles of neural networks, support 
vector machines (SVM), Bayesian nets, and much more. 

This study investigates the issues of purely local, piecewise and global, ME and 
boundary-optimized classification techniques in application to medical decision 
making cases with small, medium and large sample (N) sizes. By purely local models 
such as k-NN, the input space is hard partitioned into regions and decision outcome 
for each individual sample is obtained from this certain region. The piecewise 
classification function like in C4.5 and classification and regression tree (CART) 
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consists of patchwork of local classifiers that collectively cover input space. In C4.5 
classifier, individual features are considered for partition, but in CART, a 
combination of features decides the boundaries of class regions in the input space. By 
contrast, in global classification as in MLP, a single classification function that is 
generated by sigmoidal units of hidden layer must fit the data well everywhere with 
no explicit partitioning of input space or no subdivision of the parameter set. In the 
single perceptron case, the classification function becomes a single hyperplane that 
partitions input space into two regions. As a ME model, RBF net combines local 
experts’ classification functions and makes a decision that is optimized in the 
minimum mean squared error sense (MMSE). RBF net also interprets the individual 
local classification functions of experts with a natural probabilistic association 
(conditional expectation) through output unit [2, 3]. Recently, boundary-optimized 
classification model, support vector machines (SVM) have been proposed and used in 
various applications [4]. The SVM maximizes the distance (margin) between the 
closest data points (support vectors) to the decision hyperplane and thus optimizes the 
decision rule for a better performance. 

In our study on medical data analysis, we consider three cases: risk factors of 
stroke for a small population of size N, hypoxia decision for antenatal cases for a 
medium population of size N and intranatal decision making for normal and hypoxic 
conditions for a fairly large population of size N. In the stroke case, a total of N=44 
diabetes mellitus (DM) patients are considered with 22 non-embolic stroke cases and 
22 without stroke. In antenatal follow up, a total of N=210 patients are observed, 
there was 67 hypoxic cases and the rest (143) were normal cases. In intranatal 
monitoring, the total number of patients was 1537. Only 48 of them were observed to 
be hypoxic and the rest were normal delivery cases. 

The rest of the paper is organized as follows: In the next section, a general 
classification model is described and examples of local, global and piecewise 
classifiers are reviewed. In the third section, we present the details of three medical 
cases that employ the decision-making procedure. The fourth section describes the 
performance and time complexity results of the classifiers and discusses issues of 
sample size N near the other effects. In the final section concluding remarks for the 
study are given. 



2 Classification Models: k-NN, C4.5, CART, MLP, RBF, 
and SVM 

In this work we discuss classifier choice with regards to the data sizes in medical 
decision making applications. Typically classifier design problem can be formulated 
as follows: Given a set of input-output pairs T = {(x h yj}, where x t e R" , y t e R m , we 
would like to design a mapping f: R" => R m that minimizes the classification error, 
which, in the case of squared error, is given by E[(y-f(x,w)) 2 ] . Note that the value of m 
changes according to the number of classes. 

A purely local classifier such as k-NN employs local information by using the 
nearby samples of unknown sample, thus it may be considered for a small number of 
samples that describe the features of a region fairly good enough. Class belongingness 
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decision is obtained from a local input space. In a k-NN with no weight case, the 
minimized mean squared-error cost can be defined as 



E 





( 1 ) 



where E shows the error of the optimization process according to given classes. A 
minimum total distance to a particular class defines the class identity of the unknown 
sample. Various windows may be used as the weight of the distance such as parzen 
window, etc. For medium and large size of data populations, local approaches can be 
considered as simple and effective classifiers and they produce the overall decision by 
the local decisions that use no specific rule. Local classifiers use no specific 
parametric model, they only depend on prescribed distances between regional 
samples. 

Piecewise classifiers such as C4.5 and CART consider the partition of decision 
feature space by a patchwork of local decision functions. Overall decision collectively 
covers the total input space. In the C4.5 case, individual features draw class borders in 
a staircase shape. There is no effect of the rest of the features on the particular 
decision border drawn by a specific feature. In the CART case, a combination of the 
features is used to construct the class decision regions. In both cases, the overall class 
decision is obtained in a piecewise continuous manner; in fact local borders between 
classes represent sub-regional rules. There is no specific unknown sample (center) for 
the border description. In summary, piecewise models fit the data in local regions and 
patch the local parameters in a divide-and-conquer sense. 

Global models like MLP employs a single classification function that must fit the 
data well everywhere with no explicit partitioning of the input space and no 
subdivision of the parameter set. It thus becomes more difficult to describe the role of 
individual parameters. 

Popular MLP architecture with d inputs, h hidden units and one output is 
formulized as follows: 

h d (2) 

y(x) = fCZVj */(!>,, **,. + Wj0 ) + v 0 ) 

j = i i=i 



f(.) is typically a smooth function such as the sigmoid function — , w and V are 

1 l e v 

the weights, parameters of the system. Back propagation (BP) algorithm and 
variations are used to train the structure (to optimize the weights of the system) [5, 6]. 
Sample population N must be large enough to train a MLP with a number of hidden 
units. A single global parametric model is described for the overall region. 

The statistically based ME model can be described for the classification problems 
as follows: we first define local expert’s classification function f(x,Wj), where Wj is the 
set of model parameters for the local model j. The local experts may be constant, 
linear or nonlinear (polynomial of any degree) function ofr. The overall classification 
function of the ME model is 
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f (x, w) = Y J P \j / x \f{ x > w j ) 

j 



(3) 



where P[j/x] is a nonnegative weight of association between the input x and the expert 
(or local model) j and it determines the degree to which expert j contributes to the 
overall model output. These weights are often called gating units [2, 3] and are 
imposed 2} P[j/x]=l, which is parametric function determined by a parameter set 
w gating- Statistical interpretation of the model is as follows: input-output pair (x h y) is 
randomly selected by an input density and by a local model according to probability 
mass function {P[j/x]}. For a selected local model k. the output is generated as a 
random variable whose mean is f(x,Wk). With this viewpoint, f(x,w) represents the 
expected value of the output for a given input x. It is well known that the conditional 
expectation is the minimum mean-squared error (MMSE) classifier. 

One main advantage of the ME model is creating a compromise among the purely 
local model such as that of k-NN, that is valid on a particular region of input space, 
the piecewise model such as that of a combination of rules for successive regions, and 
the global models such as that of MLP. The hard partition of regions of input space in 
“purely local” models is extended by piecewise functions of successive linear 
classifiers. In global models like MLP, a single classification function must classify 
the data everywhere with no explicit partitioning of the input space. Here, one can see 
the connection between ME models and the piecewise models: the piecewise models 
divide the data into local regions and conquer the local parameters in a divide-and- 
conquer sense. ME model also decomposes classification problem into the learning 
set of local (expert) models, but none of them claims exclusive ownership of the 
region unlike piecewise models. By using P[j/x] weights that is restricted to the 
values {0,1}, the parameters are added only the fit in the local region needs to be 
improved. Thus, the overall problem is simplified in terms of learning and modeling. 
Furthermore, unlike the piecewise modeling that generates discontinuous regions at 
boundaries, ME functions are smooth everywhere due to the averaging in equation 
(3). 

In our study, we implemented normalized radial basis function (RBF) net as the 
ME structure. The normalized RBF uses a number of Gaussian functions to model the 
local regions and summarizes the overall region at the output nodes with a class 
decision. The normalized RBF net is an extension of the basic two-layer RBF net: in 
the hidden layer, we compute the hidden outputs by 



P[k/x] 



RkW 

Iw 

/ 



(4) 



where R k (x) = 





are the commonly used gaussian basis 



functions with m k center {prototype ) vectors and <7 bandwidth. The k index is 
assigned to specify the hidden node and I ranges over all hidden node indices in the 
network. The second layer generates a linear combination 
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g{x) = Y J P[^ lx \f{x ,w,) 



(5) 



/=! 



This architecture may be interpreted as a ME model where the weights {P[k/x ]} 
represent the probabilities of association with the corresponding local models 

tf(x,w,)}. 

Support vector machines (SVM) have been successfully employed for a number of 
real life problems [7]. They directly implement the principle of structural risk 
minimization [7] and work by mapping the training points into a high dimensional 
feature space where separating hyperplane (w, b) is found by maximizing the distance 
from the closest data points (boundary-optimization). Hyperplanes can be represented 
in feature space, which is a Hilbert space, by means of kernel functions such as 



Gaussian kernels K(x',x) = exp- 



\\x -mi. 



/2cx~ 



Kernel functions are dot 



products between mapped pairs of input points x f , i=l,...,p. For input points x„ 
mapping to targets y t (i=l,..,p), the decision function is formulated in terms of these 



kernels / (x) = sign 



Y J ce i y i K{x,x i ) + b 
V i=l 

coefficients that are maximized by the Lagrangian: 



where b is the bias and OCj are the 



1 (-P 



i=Ya a i Y, a i a jyiyj K ( x i’ x j ) 



i=i 



v«=i 



(6) 



subject to the constraints: CX > 0 and 



. Only those points that lie closest 



Z a iT« =° 

0=1 j 

to the hyperplane have oq> 0 (support vectors). Linear (inner product) and gaussian 
kernel functions are selected in our experiments. 



3 Medical Cases: Major Risk Factors of Stroke, Antenatal, 
and Intranatal Risk Assessment 

Various sizes of medical databases are used in today’s medical decision making 
applications [8, 9]. Each data set requires a proper choice of classifier due to the 
specific features such as size, application properties, etc. Using the classifiers that we 
discussed above, we present three medical cases with different features and with 
different sample sizes. A stroke risk factors discrimination case with small size N, an 
antenatal hypoxia discrimination case with medium size N and an intranatal hypoxia 
monitoring case with reasonably large size N are considered. 
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(a) 



(b) 







Fig. 1. 3-dimensional appearance of (a) stroke data , (b) antenatal data, (c) intranatal data 



3.1 A Small Population (N) Case: Diabetic Patients with and without 
Non-embolic Stroke 

Using various diagnostic tests to search for evidence of disease is generally routine 
for monitoring patients with diabetes mellitus (DM) [10, 11] (Figure 1(a)). Measuring 
major risk factors of ischemic stroke in patients with DM can also provide reasons for 
the attending physician to initiate preventive measures adapted to particular case. 
Typical microvascular complications are neuropathy, nephropathy, retinopathy, 
macrovascular complications are coronary artery diseases (CAD) and peripheral 
vascular diseases (PVD). Abnormal test results of cholesterol, HDL, triglycerides 
levels, FGL and RGL and systolic and diastolic blood pressures are considered risk 
factors of nonembolic-ischemic stroke for DM patients. 

The study population of 44 patients was chosen with these glucose levels. Diabetes 
mellitus (DM) is diagnosed by a fasting glucose level higher than 140 mg/dl and 
random glucose levels higher than 200 mg/dl in repeated measurements. The follow- 
up data of 22 diabetic patients with ischemic stroke (non-embolic) and 22 diabetic 
patients without stroke were collected over several years [11]. We use 7 metric 
components (cholestrol, F1DL, triglyserides, FGL, RGL, systolic and diastolic 
pressures) in the experiments. 



3.2 A Medium Size (N) Population Case: Antenatal Hypoxic Risk Assessment 

In normal pregnancy, impedance to flow in the Umbilical Artery (UA) decreases with 
advancing gestation [12-14] (Figure 1(b)). Several studies have already demonstrated 
the possibilities and limits of using umbilical Doppler for the assesment of fetal 
growth. Some of these studies have used the systolic/diastolic (S/D) ratio, the 
resistance index RI=((S-D)/D). or pulsatility index PI=((S-D)/mean velocity). 



The Effects of Data Properties 



57 



measurement on the UA Doppler velocity waveform. All of these UA resistance 
indices, when greater than the upper limit of normal range (>2 SD), are frequently 
associated with growth retarded pregnancy or intrauterine growth retardation (IUGR). 

The 4 input values for classifiers are weekly ultrasound values that are PI, RI, S/D 
ratio from UA and WI. The WI is the normalized gestational age in terms of weeks 
between 0 to 40. In the antenatal hypoxia experiments, we employ a total of the 210 
ultrasound measurements WI [14]. 67 of them are hypoxic and the other 143 are 
normal. 



3.3 A Fairly Large (N) Population Case: Intranatal Acidosis Monitoring 

Persistent fetal hypoxemia can lead to acidosis and neurologic injury and current 
methods to detect fetal compromise are indirect and nonspecific. Theoretically, direct 
continuous noninvasive measurement of fetal oxygenation is desirable to improve 
intrapartum fetal assesment and the specificity and detecting fetal compromise. The 
development of reflectance pulse oximetry has made it possible to measure fetal 
oxygen saturation during labor [15, 16]. 

The data of this study (Figure 1(c)) consist of umbilical cord blood samples of 
1537 live-born singleton neonates (48 of them are hypoxic). 6 dimensions of input 
consists of two oxygen saturation values (measured by spectrophotometry), two pH 
values and two base excess values (from UA and UV) (measured by a pH and blood 
gas analyzer). Preductal oxygen saturation was calculated with an empirical equation 
[17]. Acidosis was defined as below the value of 7.09 for UA pH or -10.50 mmol/L 
for base excess. 



4 Implementation Issues and Experimental Results 

The study populations contain three medical cases: a stroke database with 44 diabetic 
patients, an antenatal hypoxia database with 210 cases and an intranatal database with 
1537 cases. The performances are measured in terms of sensitivity and specificity. In 
this evaluation, test samples fall into one of 4 categories: false positive (FP) if the 
system decides as a positive while it is a negative, false negative (FN) if the system 
decides as a negative while it is a positive. Those decisions are false decisions and FP 
becomes vital error and is expected to be avoided. The others are true positive (TP) 
and true negative (TN) if the system decides correctly. 

In the small population size stroke case with two classes, we observe a good 
success rate of piecewise C4.5 and global MLP and boundary-optimized SVM 
approaches over all input space. The specificity and sensitivity results of the other 
classifiers become inferior to C4.5, MLP and SVM. Multi-parameter classifiers were 
found to significantly improve upon the classification performance of single 
parameter designs. Instead of single parameter based conclusions, we employed the 
decision produced by major risk factors. The method gives the study stronger 
arguments on the distinctive factors. The small sample population size has an 
acknowledgable potential effect on the statistical power of the study, but we use 
classification techniques like C4.5 and MLP to help overcome this drawback. SVM’s 
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performance is not good in linear kernel case, it also can not find optimized values 
with gaussian kernels. Since CART considers a combination of individual features, it 
was not very successful for an input space with small size. Normalized RBF was also 
unsuccessful since it summarizes the local regions of input space at the output. In the 
case of C4.5, CART, MLP, normalized RBF, SVM, we implement a leave-n-out 
(n=4, 4 samples for testing of classifier trained with the rest of N=40 samples) method 
(jacknife, cross-validation) and repeat the experiments 4 times. The average 
performance is reported. The results are shown in Table 1. 

It is an overall observation that the given stroke data which is used in our study is a 
difficult case of classification with a small sample size N and nonlinear separation 
function. This is also observed by the classifier performances. 



Table 1. The test performance values and time complexities of the algorithms for diabetic 
patients with and without non-embolic stroke 





Sensitivity 


Specificity 


TP 


TN 


CPU-time 

(secs) 


k-NN 


%40 


%40 


%40 


%40 


0.05 


C4.5 


%63 


%63 


%63 


%63 


0.7 


CART 


%43 


%45 


%38 


%50 


0.78 


MLP 


%63 


%63 


%63 


%63 


0.65 


Norm. RBF 


- 


- 


- 


- 


- 


SVM(linear) 


%50 


%50 


%50 


%50 


0.057 


SVM( Gaussian) 


- 


- 


- 


- 


- 



In the medium size (N) antenatal case with two classes (Table 2), MLP and RBF 
net yielded good results. k-NN has a close performance to MLP and normalized RBF 
net. The performance of CART is weak since it does not include the effect of 
combination of parameters. It is furthermore observed that the antenatal data is fairly 
easily classifiable and the performance results support this conclusion. 



Table 2. The test performance values and time complexities of the algorithms for antenatal 
hypoxic risk assessment 





Sensitivity 


Specificity 


TP 


TN 


CPU-time 

(secs) 


k-NN 


%89 


%98 


%96 


%95 


0.05 


C4.5 


%87 


%97 


%92 


%94 


0.7 


CART 


%86 


%90 


%70 


%95 


0.98 


MLP 


%95 


%98 


%96 


%98 


0.65 


Norm. RBF 


%96 


%98 


%95 


%98 


0.70 


SVM(linear) 


%97 


%61 


%69 


%95 


0.065 


SVM( Gaussian) 


%90 


%51 


%60 


%86 


0.068 



In the large size (N) intranatal data with two classes (Table 3), MLP and 
normalized RBF net perform well. CART has difficulty of learning the space because 
the number of hypoxic samples was not enough to train the CART structure. Finally 
k-NN and C4.5 perform weak compared to MLP and normalized RBF. The 
normalized RBF net describes input space better since it first makes a summary of 
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local spaces at the first layer and then optimizes them at the output layer in the 
Bayesian sense. SVM with used parameter settings (56 support vectors for linear 
kernels and 93 support vectors for Gaussian kernels) gives inferior results. The 
intranatal data has fewer hypoxic samples: a total of 48 hypoxic cases exist among a 
total of 1537. This creates an unbalanced classification problem. The information on 
the borders of classes becomes valuable. The problem may be reduced to a different 
size classification case by clarifying the samples that have no information in the input 
space. 

Finally, we compare the time complexities of the classifiers. The performances are 
given in 5 th columns of Table 1, 2, 3. k-NN and SVM algorithms are observed to run 
fast compared to the other algorithms. 



Table 3. The test perfonnance values and time complexities of the algorithms for intranatal 
acidosis monitoring 





Sensitivity 


Specificity 


TP 


TN 


CPU-time (secs) 


k-NN 


%92 


%94 


%95 


%90 


0.10 


C4.5 


%92 


%94 


%95 


%90 


0.76 


CART 


- 


- 


- 


- 


- 


MLP 


%95 


%100 


%100 


%97 


0.65 


Norm. RBF 


%97 


%93 


%94 


%96 


0.75 


SVM(linear) 


%98 


%83 


%98 


%82 


0.07 


SVM(Gaussian) 


%95 


%91 


%99 


%53 


0.07 



5 Discussion 

We have studied three cases in medical decision making with the six classifiers with 
different properties: a k-NN that considers local feature vectors and defines a local 
decision hyperplane, an C4.5 that considers individual features for a piecewise 
hyperplane, a CART that considers the combination of the features for a linear 
hyperplane, a MLP that considers overall features and introduces a global, nonlinear 
decision surface, a normalized RBF that considers local, smoothed-piecewise models 
(ME) with Bayesian conditional probabilities for an overall decision and a SVM that 
considers support vectors of hyperplane for a Lagrangian optimization. 

It is known that a MLP generates a nonlinear decision surface. In our study this is 
supported by our experiments. When sample size N is small, a training problem 
occurs. But this problem is also handled by strategies such as leave-n-out. 

A normalized RBF net as a ME model offers new aspects over the classifiers such 
as k-NN, C4.5, CART and MLP, as it covers local feature space regions and patches 
these regions with Bayesian probabilities. For example, a set of parameters of MLP 
that defines the nonlinear decision surface can be divided into partial sets of 
parameters by a normalized RBF net. Each parameter set claims representation of a 
part of the decision surface. It can not be efficiently used for the stroke case since N is 
very small. For large N, it is still a problem to efficiently employ a normalized RBF 
net when class sizes are unequal and the size of one class is small. In intranatal 
hypoxia case, we face very small N values (48 for hypoxic cases) which is a typical 
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situation since the most of the fetuses are healthy (1489 healthy cases). As a result, 
the ME model (norm. RBF) also becomes useful since it defines individual set of 
parameters for each subspace. Also, a specific ME structures [18] other than 
normalized RBF are useful for various applications. 

The advantageous aspect of SVM is that it introduces boundary-optimization 
through a Lagrangian method using various kernel functions and thus, it produces 
good decision regions by optimizing data points that are closest to decision 
hyperplane. Various kernels such as linear, gaussian, polynomial, etc are available. In 
the application of three medical decision making cases, we observe the following: In 
the stroke case with small and medium N with equal size of samples in each class, the 
SVM works only for linear kernels with a poor performance. In the intranatal case 
with fairly large data size N and with also unequal number of data points in each 
class, it produces a good training performance, but the test performance was not very 
good when compared to a MLP or a normalized RBF. 

As a result, nonlinear MLP, normalized RBF (ME structure) and boundary- 
optimized SVM are valuable in medical decision making applications when enough 
data is available. For small size N, the drawback of small N can be mitigated for a 
MLP by training strategies in our case. Statistically-based normalized RBF net needs 
more data. 

One advantage of SVM structure is that we can control the sensitivity of the 
structure for individual classification performances. In many medical applications, 
medical specialists do not want to miss the false positive: FP (decision system labels 
it as positive while it is negative) cases. This makes SVM based system useful in 
many medical decision making applications. 
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Abstract. Inducing classification rules on domains from which infonnation is 
gathered at regular periods lead the number of such classification rules to be 
generally so huge that selection of interesting ones among all discovered rules 
becomes an important task. At each period, using the newly gathered 
information from the domain, the new classification rules are induced. 
Therefore, these rules stream through time and are so called streaming 
classification rules. In this paper, an interactive rule interestingness-leaming 
algorithm (IRIL) is developed to automatically label the classification rules 
either as “interesting” or “uninteresting” with limited user interaction. In our 
study, VFP (Voting Feature Projections), a feature projection based incremental 
classification learning algorithm, is also developed in the framework of IRIL. 
The concept description learned by the VFP algorithm constitutes a novel 
approach for interestingness analysis of streaming classification rules. 



1 Introduction 

Data mining is the efficient discovery of patterns, as opposed to data itself, in large 
databases [1], Patterns in the data can be represented in many different forms, 
including classification rules, association rules, clusters, sequential patterns, time 
series, contingency tables, and others [2], However, for example, inducing 
classification rules on domains from which information is gathered at regular periods 
lead the number of such classification rules to be generally so huge that selection of 
interesting ones among all discovered rules becomes an important task. At each 
period, using the newly gathered information from the domain, the new classification 
rules are induced. Therefore, these rules stream through time and are so called 
streaming classification rules. 

In this paper, an interactive rule interestingness-learning algorithm (IRIL) is 
developed to automatically label the classification rules either as “interesting” or 
“uninteresting” with limited user interaction. In our study, VFP (Voting Feature 
Projections), a feature projection based incremental classification learning algorithm, 
is also developed in the framework of IRIL. The concept description learned by the 
VFP algorithm constitutes a novel approach for interestingness analysis of streaming 
classification rules. Being specific to our concerns, VFP takes the rule interestingness 
factors as features and is used to learn the rule interestingness concept and to classify 
the newly learned classification rules. 
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Section 2 describes the interestingness issue of patterns. Section 3 is devoted to 
the knowledge representation used in this study. Sections 4 and 5 are related to the 
training and classifying phases of the VFP algorithm. IRIL is explained in the 
following section. Giving the experimental results in Section 7, we conclude. 



2 Interestingness Issue of Patterns 

The interestingness issue has been an important problem ever since the beginning of 
data mining research [3]. There are many factors contributing to the interestingness of 
a discovered pattern [3-5]. Some of them are coverage, confidence, completeness, 
action ability and unexpectedness. The first three factors are objective, action ability 
is subjective and unexpectedness is sometimes regarded as subjective [6-8] and 
sometimes as objective [9,10]. Objective interestingness factors can be measured 
independently of the user and domain knowledge. However, subjective 
interestingness factors are not user and domain knowledge independent. The 
measurement of a subjective interestingness factor may vary among users analyzing a 
particular domain, may vary among different domains that a particular user is 
analyzing and may vary even for the same user analyzing the same domain at 
different times. 

An objective interestingness measure is constructed by combining a proper subset 
of the objective interestingness factors in a suitable way. For example, objective 
interestingness factor x can be multiplied by the square of another objective 
interestingness factory to obtain an objective interestingness measure of the form xy 2 . 
It is also possible to use an objective interestingness factor x alone as an objective 
interestingness measure (e.g. Confidence). Discovered patterns having Confidence > 
threshold are regarded as “interesting”. Although the user determines the threshold, 
this is regarded as small user intervention and the interestingness measure is still 
assumed to be an objective one. 

The existing subjective interestingness measures in the literature are constructed 
upon unexpectedness and action ability factors. Assuming the discovered pattern to be 
a set of rules induced from a domain, the user gives her knowledge about the domain 
in terms of fuzzy rules [8], general impressions [7] or rule templates [6]. The induced 
rules are then compared with user’s existing domain knowledge to determine 
subjectively unexpected and/or actionable rules. 

Both types of interestingness measures have some drawbacks. A particular 
objective interestingness measure is not sufficient by itself [8]. They are generally 
used as a filtering mechanism before applying a subjective measure. On the other 
hand, subjective measures are sometimes used without prior usage of an objective 
one. In the case of subjective interestingness measures, user may not be well in 
expressing her domain knowledge at the beginning of the interestingness analysis. It’d 
be better to automatically learn this knowledge based on her classification of some 
presented rules as “interesting” or “uninteresting”. Another drawback of a subjective 
measure is that the induced rules are compared with the domain knowledge that 
addresses the unexpectedness and/or action ability issues. Interestingness is assumed 
to depend on these two issues. That is, if a rule is found to be unexpected, it is 
automatically regarded as an interesting rule. However, it would be better if we 
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learned a concept description that dealt with the interestingness issue directly and if 
we benefited from unexpectedness and action ability as two of the factors used to 
express the concept description. That is, interestingness of a pattern may depend on 
factors other than unexpectedness and action ability issues. 

The idea of a concept description that is automatically determined and directly 
related with the interestingness issue motivated us to design IRIL algorithm. The 
concept description learned by the VFP algorithm, which was also developed in this 
framework, constitutes a novel approach for interestingness analysis of classification 
rules. 

To ensure that the concept description is directly related to the rule interestingness 
issue, some existing and newly developed interestingness factors that have the 
capability to determine the interestingness of rules were used instead of the original 
attributes of the data set. Current implementation of IRIL does not incorporate 
unexpectedness and action ability factors, leading to no need for domain knowledge. 
Although the interestingness factors are all of type objective in the current version of 
IRIL, the thresholds of the objective factors are learned automatically rather than 
expressing them manually at the beginning. The values of these thresholds are based 
upon the user’s classification results of some presented rules. So, although in the 
literature subjectivity is highly related to the domain knowledge, IRIL differs from 
them. IRIL’s subjectivity is not related with the domain knowledge. IRIL makes use 
of objective factors (actually the current version makes use of only objective factors) 
but for each such a factor, it subjectively learns what ranges of factor values (what 
thresholds) lead to interesting or uninteresting rule classifications if only that factor is 
used for classification purposes. That is, IRIL presents a hybrid interestingness 
measure. 

IRIL proceeds interactively. An input rule is labeled if the learned concept 
description can label the rule with high certainty. If the labeling or classification 
certainty factor is not of sufficient strength, user is asked to classify the rule manually. 
The user looks at the values of the interestingness factors and labels the rule 
accordingly. In IRIL, concept description is learned or updated incrementally by using 
the interestingness labels of the rules that are on demand given either as “interesting” 
or “uninteresting” by the user. 



3 Knowledge Representation 

We think of a domain from which information is gathered at regular periods. For each 
period p, classification rules are induced from the gathered information and these 
streaming rules’ interestingness labeling seems to be an important problem. This 
labeling problem is modeled as a new classification problem and a rule set is 
produced for these rules. Each instance of the rule set is represented by a vector 
whose components are the interestingness factors having the potential to determine 
the interestingness of the corresponding rule and the interestingness label of the rule. 

The classification rules used in this study are probabilistic and have the following 
general structure: 
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If (A | op valuei) AND (A 2 op valuej) AND . . .AND (A n op valued THEN 
{Class votei, Class 2 : vote 2 ,..., Classy, vote J 
A j’s are the features, Classes are the classes and op e {=, A <, <, >, >}. 

The instances of the rule set have either “interesting” or “uninteresting” as the 
interestingness label, and have the interestingness factors shown in Table 1. In this 
new classification problem, these factors are treated as determining features, and 
interestingness label is treated as the target feature (class) of the rule set. 



Table 1 . Features of the rule set 



Feature 


Short description and/or formula 


Major Class 


Classi that takes the highest vote 


Major Class Frequency 


Ratio of the instances having Classi 
as the class label in the data set 


Rule Size 


Number of conditions in the 
antecedent part of the rule 


Confidence with respect to Major Class 


. Antecedent & C/ass; / \Antecedent\ 


Coverage 


\Antecedent\ / \N\ 


Completeness with respect to Major 
Class 


[ Antecedent & Cfaw.?; / 


Zero Voted Class Count 


Number of classes given zero vote 


Standard Deviation of Class Votes 


Standard deviation of the votes of 
the classes 


Major Class Vote 


Maximum vote value distributed 


Minor Class Vote 


Minimum vote value distributed 


Decisive 


True if Std.Dev.of Class. Votes > s min 



Each feature carries information of a specific property of the corresponding rule. 
For instance, letting Class^ to take the highest vote makes it the Major Class of that 
classification rule. If we shorten the representation of any rule as “If Antecedent 
THEN Class ” and assume the data set to consist of N instances, we can define 
Confidence, Coverage and Completeness as in Table 1. Furthermore, a rule is decisive 
if the standard deviation of the votes is greater than s min , whose definition is given in 
the following equation: 



( Class Count - 1 Class Count 

If a rule distributes its vote, ‘1’, evenly among all classes, then the standard deviation 
of the votes becomes zero and the rule becomes extremely indecisive. This is the 
worst vote distribution that can happen. The next worst vote distribution happens if 
exactly one class takes a zero vote, and the whole vote is distributed evenly among 
the remaining classes. The standard deviation of the votes that will occur in such a 
scenario is called s min . 
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4 Training in the VFP Algorithm 



VFP (Voting Feature Projections) is a feature projection based classification-learning 
algorithm developed in this study. It is used to learn the rule interestingness concept 
and to classify the unlabeled rules in the context of modeling rule interestingness 
problem as a new classification problem. 

The training phase of VFP, given in Figure 3, is achieved incrementally. On a 
nominal feature, concept description is shown as the set of points along with the 
numbers of instances of each class falling into those points. On the other hand, on a 
numeric feature, concept description is shown as the gaussian probability density 
functions for each class. Training can better be explained by looking at the sample 
data set in Figure 1, and the associated learned concept description in Figure 2. 



/t\ 



int 




int int , uriint 

A --O O + ---O- 

! I ! 
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Fig. 1. Sample data set 
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Fig. 2. Concept description learned for the sample data set 



The example data set consists of 10 training instances, having nominal /, and numeric 
f 2 features./, takes two values: ‘A’ and ‘B’, whereas / takes some integer values. 
There are two possible classes: “interesting” and “uninteresting”. f 2 is assumed to 
have gaussian probability density functions for both classes. 
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VFP traln (t) /* t: newly added training instance */ 

begin 

let c be the class of t 
let others be the remaining classes 
if training set = {t} 
for each class s 

class_count [s] = 0 
class_count [c] ++ 

for each feature f 



if f is nominal 

p = find_point{f,t f ) 
if such a p exists 

point_class_count [f,p,c] ++ 
else /* add new point for f */ 
add a new p' point 
point_class_count [f,p',c] = 1 
point_class_count [ f , p others] = 0 



else if f is numeric 
if training set = {t} 

A f,c f ' f , others ^ 

M'f.c = t/ ' ^ others = 0 

O t c = Undefined 

norm_density__func . = Undefined 

else 

n = class_count [c] 

jUf.c = (Mf.c * (-n-D + t f ) / n /*update*/ 

f , c = (rff.c * ( n_1 ) + t/) / n /*update*/ 

af '= = J— 7(M? c -(M,,c) 2 ) 

\ n — 1 



return 



end. 



norm_density_func . = 



rr f c 2 n 



(x-p f 
2 a f 



2 

c 



2 



For numeric features: 
norm_density_func. [c (\/f, c) 

For nominal features: 

point_class_count [f , p, c] (Vf, p, c ) 



Fig. 3. Incremental train in VFP 



In Figure 3 for a nominal feature /, find _point (f, tj) searches tf, the new training 
instance’s value at feature f in the / projection. If tf is found at a point p, then 
point _class_count [f, p, c ] is incremented, assuming that the training instance is of 
class c. If tf is not found, then a new point p ’ is constructed and point class count \f, 
p class ] is initialized to 1 for class = c, and to 0 for class = others. In this study, 
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features used in VFP are the interestingness factor values computed for the 
classification rules, and the classes are “interesting” and “uninteresting”. 

For a numeric feature/, if a new training instance t of class c is examined, the 
previous training instances of class c is let to construct a set P and flj; c and 0 ( , are let 
to be the mean and the standard deviation of the / feature projection values of the 
instances in P, respectively. ///,, and <7 fc are updated incrementally. Updating <7 lx . 
incrementally requires /Sf c to be updated incrementally, as well. 



5 Classification in the VFP Algorithm 

Classification in VFP is shown in Figure 4. The query instance is projected on all 
features If a feature is not ready to querying, it gives zero, otherwise normalized 
votes. Normalization ensures each feature to have equal power in classifying the 
query instances. For a feature to be ready to querying, it requires to have at least two 
different values for each class. 

The classification starts by giving zero votes to classes on each feature projection. 
For a nominal feature f find joint (/ qj) searchs whether ^/-exists in the / projection. 
If qj is found at a point p, feature / gives votes as given in equation 2, and then these 
votes are normalized to ensure equal voting power among features. 



, r/ . . point clas s count I" f, p,c 1 

feature_vote [f, cj = — = = J r — - 

class corn t [ c ] 



(2) 



In equation 2, the number of class c instances on point p of feature projection / is 
divided by the total number of class c instances to find the class conditional 
probability of falling into the p point. For a linear feature / each class gets the vote 
given in equation 3. Normal probability density function values are used as the vote 
values. These votes are also normalized. 



q f\ Kx 7 (3) 

feature jvote [f, c] = hm Ax ^ 0 q f 2 dx 

q f 0 fc -d 2 * 



Final vote for any class c is the sum of all votes given by the features. If there exists a 
class c that uniquely gets the highest vote, then it is predicted to be the class of the 
query instance. The certainty factor of the classification is computed as follows: 

final vote[c] ^ 

iiClosse.s 

^ final vote \ i \ 
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VFP ery (g) /* g: query instance*/ 

begin 

for each feature f and class c 
feature_vote[f,c] = 0 
if feature_ready_for_query_process ( f ) 
if f is nominal 

p = find_point {f, q f ) 
if such a p exists 
for each class c 



feature vote [ f,c ] = point _clas s_count [f p, c] 

class coun t[c\ 

normalize_feature_votes (f) 
else if f is numeric 
for each class c 

feature_vote [f , c] = 



q s+Ax 



lim A x . 



1 






If 



o fc -JJk { 



2a 



f,c 



dx 



normalize_feature_votes ( f ) 
for each class c 



final vote [c] 



# F eatu res 

^ feature _yo te[fc\ 
f=i 



, # Classes t # Classes 

if min final _vote[i\ < final_vote [ Jc] - max final _ vote [/] 

1=1 1=1 

classify q as "k" with a certainty factor C f 
return C f 
else return -1 
end . 



Fig. 4. Classification in VFP 



6 IRIL Algorithm 

IRIL algorithm, shown in Figure 5, needs two input parameters: R p (The set of 
streaming classification rules of period p and MinC, (Minimum Certainty Threshold). 
It tries to classify the rules in R p . If Q> MinC , for a query rule r, this rule is inserted 
into the successfully classified rules set ( R s ). Otherwise, two situations are possible: 
either the concept description is not able to classify r ( Cf = -1), or the concept 
description’s classification (prediction of r’s interestingness label) is not of sufficient 
strength. If Cf < MinC,, rule r is presented, along with its computed eleven 
interestingness factor values such as Coverage, Rule Size, Decisive etc., to the user for 
classification. This rule or actually the instance holding the interestingness factor 
values and the recently determined interestingness label of this rule is then inserted 
into the training rule set R, and the concept description is reconstructed incrementally. 

All the rules in R p are labeled either automatically by the classification algorithm, 
or manually by the user. User participation leads rule interestingness learning process 
to be an interactive one. When the number of instances in the training rule set 
increases, the concept description learned tends to be more powerful and reliable. 
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IRIL executes on classification rules of all the periods and finally concludes by 
presenting the labeled rules in R s . 



IRIL (R p , MinC t ) 

begin 



R t <- 0 , R, <- 0 

if p is the 1st period //Warm-up Period 

for each rule r e R 

ask the user to classify r 

set C f of this classification to 1 

insert r into R t 

VFP. . (r) 

else 

for each rule r e R 

P 



C, 4- VFP (r) 
if C f < MinC t 

ask the user to classify r 

set C f of this classification to 1 

insert r into R t 

VFP train (r) //Update Concept Description 
else 

insert r into R g 
return rules in R s 
end. 



Fig. 5. IRIL algorithm 



7 Experimental Results 

IRIL algorithm was tested to classify 1555 streaming classification rules induced from 
a financial distress domain between years 1989 and 1998. Each year has its own data 
and classification rules induced by using a benefit maximizing feature projection 
based rule learner proposed in [11]. The data set of the financial distress domain is a 
comprehensive set consisting of 25632 data instances and 164 determining features 
(159 numeric, 5 nominal). There are two classes: “DeclareProfh” and “DeclareLoss”. 
The data set includes some financial information about 3000 companies collected 
during 10 years and the class feature states whether the company declared a profit or 
loss for the next three years. Domain expert previously labeled all the 1555 induced 
rules by an automated process to make accuracy measurement possible. Rules of the 
first year are selected as the warm-up rules to construct the initial concept description. 

The results for MinQ = 51% show that 1344 rules are classified automatically 
with Cf > MinC t . User participation is 13% in the classification process. In the 
classification process, it is always desired that rules are classified automatically, and 
user participation is low. 

The accuracy values generally increase in proportion to the MinQ. Because higher 
the MinQ, higher the user participation is. And higher user participation leads to learn 
a more powerful and predictive concept description. 
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Table 2. Results for IRIL 





MinQ 

51% 


MinQ 

53% 


MinC, 

55% 


MinC, 

57% 


Number of rules 


1555 


1555 


1555 


1555 


Number of rules classified 
automatically with high certainty 


1344 


1286 


1196 


1096 


User participation 


13% 


17% 


23% 


29% 


Overall Accuracy 


80% 


82% 


86% 


88% 



8 Conclusion 

IRIL feature projection based, interactive rule interestingness learning algorithm was 
developed and gave promising experimental results on streaming classification rules 
induced on a financial distress domain. The concept description learned by the VFP 
algorithm, also developed in the framework of IRIL, constitutes a novel approach for 
interestingness analysis of classification rules. The concept description differs among 
the users analyzing the same domain. That is, IRIL determines the important rule 
interestingness factors for a given domain subjectively. 
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Abstract. We use a Fuzzy Petri Net (FPN) structure to represent 
knowledge and model the behavior in our intelligent object-oriented 
database environment, which integrates fuzzy, active and deductive rules 
with database objects. However, the behavior of a system can be un- 
predictable due to the rules triggering or untriggering each other (non- 
termination). Intermediate and final database states may also differ ac- 
cording to the order of rule executions (non-confluence). In order to fore- 
see and solve problematic behavior patterns, we employ a static analysis 
on the FPN structure that provides easy checking of the termination 
property without requiring any extra construct. In addition, with our 
proposed fuzzy inference algorithm, we guarantee confluent rule execu- 
tions. The techniques and solutions provided in this study can be utilized 
in various complex systems, such as weather forecasting applications and 
environmental information systems. 



1 Introduction 

Knowledge intensive applications require an intelligent environment with deduc- 
tion capabilities. In such an application, there may be two reasons for deduction; 
One is user queries and the other is events occurring inside or outside the sys- 
tem. We introduce an intelligent object-oriented database environment in order 
to fulfill the requirements of knowledge-intensive applications. In that, we in- 
tegrate fuzzy active and deductive rules with their inference mechanism in a 
fuzzy object-oriented database environment. After the incorporation of rules, 
our database system gains intelligent behavior. This allows objects to perceive 
dynamic occurrences or user queries after which they produce new knowledge or 
keep themselves in a consistent, stable and up-to-date state. We use Fuzzy Petri 
Nets (FPNs) to represent the knowledge and model the behavior of the system. 

Petri nets are considered as a graphical and mathematical modeling tool. 
They are powerful in describing and studying information processing systems 
that are characterized as being concurrent, asynchronous, distributed, parallel 
and nondeterministic [8]. Several kinds of Petri nets have been investigated as 
tools for representing rules in knowledge based systems. The main advantage of 
using Petri nets in rule-based systems is that they provide a structured knowl- 
edge representation in which the relationships between the rules in the knowl- 



C. Aykanat et al. (Eds.): ISCIS 2004, LNCS 3280, pp. 72-81, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Using Fuzzy Petri Nets for Static Analysis of Rule-Bases 



73 



edge base are easily understood and they render a systemic inference capabil- 
ity [4], Considering the uncertain and imprecise knowledge existing in various 
knowledge-intensive applications, the degree of truth of rules and facts repre- 
sented in a knowledge base is expressed as a real number in interval [0,1]. Fuzzy 
Petri Nets (FPNs) are formed [4] to handle such fuzzy expressions. 

There have been a couple of approaches on alternative formulations of FPNs 
to model the behavior of the system. However, due to the unstructured and un- 
predictable nature of rule processing, rules can be difficult to program and the 
behavior of the system can be complex and sometimes unpredictable. In an ac- 
tive database, rules may trigger and untrigger each other, and the intermediate 
and final states of the database can depend upon which rules are triggered and 
executed in which order. In order to determine these undesirable behavior pat- 
terns of the rule base, static rule analysis should be performed [1]. Such analysis 
involves identifying certain properties of the rule base at conrpile-time, which 
gives programmer an opportunity to modify the rule base. 

Two important and desirable properties of active rule behavior are termina- 
tion and confluence. These properties are defined for user-defined changes and 
database states in a given rule set. 

A rule set is guaranteed to terminate if, for any database state 
and initial modification, rule processing does not continue forever. 

/ A rule set is confluent if, for any database state and initial 

modification, the final database state after rule processing is unique, i.e. , it 
is independent of the order in which activated rules are executed. 

Static analysis techniques only give sufficient conditions for guaranteeing the 
property searched for. For example, the identification of potential non-termi- 
nation in a set of rules indicates the possibility of infinite loops at run time, 
while the identification of potential non-confluence indicates that a rule base 
may exhibit nondeterministic behavior. 

In this paper, we check properties of our system using the FPN structure. Our 
FPN structure already contains the Triggering Graph information and supports 
the static analysis of the rule base. Therefore, there is no need to do extra work 
to construct Triggering Graph as required in other studies [1,5, 6, 7, 9], which 
use different structures other than FPN for studying rule analysis. In addition, 
while performing termination analysis, we do also care about the event and 
condition compositions. We also guarantee confluent rule execution with the 
fuzzy inference algorithm that we introduce in this paper. 

The organization of this paper is as follows. In Section 2, we briefly define 
our Fuzzy Petri Net model for fuzzy rules. In Section 3, after we give the as- 
sumptions and understandings in Termination Analysis, we explain the details 
of how we perform termination analysis on the FPN. These are followed by a 
summary of the earlier work on confluence analysis in Section 4. We also explain 
how we guarantee confluence in our model in this section. Finally, we make our 
conclusions and state future work in Section 5. 
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2 A Fuzzy Petri Net Model for Fuzzy Rules 

We introduce the following Fuzzy Petri Net (FPN) structure to model the fuzzy 
rules: (P,P s ,P e ,T, A, TT, TTF, AEF,PR,PPM,TV) where 

i. P is a finite set of , in which 

- P s C P is a finite set of for primitive events or conditions. 

- P e C P is a finite set of for actions or conclusions. 

ii. T is a finite set of 

iii. A C (PxT U TxP) is a finite set of for connections between places and 
transitions. 

iv. TT is a finite set of ( ) 

v. TTF : P — > TT is , mapping each place € P to a token type 

G TT. 

vi. AEF : Arc — > expression, is mapping each arc to 

an expression. 

vii. PR is a finite set of , corresponding to either events or conditions 

or actions/conclusions. 

viii. PPM: P — > PR , is , where | PR | = | P |. 

ix. TV: P— + [0,1] is assigned to places. 

The FPN is represented as directed arcs with two types of nodes (places and 
transitions) connected by arrows that specify the direction of information flow. 
Places represent storage for input or output. Transitions represent activities 
(transformations). A token represents a fragment of information which has a 
type. Tokens are used to define the execution. An assignment of tokens to places 
is called marking. We model the dynamic behavior of fuzzy rule-based reasoning 
with evaluation of markings. Every time an input place of the FPN is marked, 
whether the corresponding transition(s) can fire has to be checked. A transition 
can fire if and only if all its input places are marked. Since we provide param- 
eter passing, token value of an output place is calculated from that of its input 
places using the transition function. The firing of a transition leads to removal 
of tokens from the input places and calculation and insertion of tokens into out- 
put places. Since we employ fuzziness, each token has a membership value to 
the place it is assigned. This is part of the token and gets calculated within the 
transition function. Figure 1 shows how we realize the steps of Fuzzy Inference 
using the FPN structure that we present above. During the FPN construction, 
first the rule definitions are obtained from the user. For each rule, a rule object is 
created, and event, condition, action parts of the rule is examined. While doing 
that, Fuzzy Petri Net places are created. Then, the fuzzy inference groups, which 
are the concurrent rule sets that gets triggered at the same time, are determined 
according to their event, condition and action parts. Finally, transitions are con- 
structed over these FPN places. In order to determine which rule triggers another 
one (i.e. which action execution or condition evaluation generates new events), 
unification of condition and action calls with event specifications is performed. 
During the FPN construction, the attributes of the rule objects are updated to 
hold the related links on FPN. Also each FPN place has a ruleset attribute in 
order to hold the rule objects that uses the FPN place. 
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event generating transition 




Fig. 1. Modeling fuzzy inference using fuzzy petri nets. 



3 Termination Analysis 

Termination for a rule set is guaranteed if rule processing always reaches a state 
in which no rule is triggered. Several methods have been proposed in the lit- 
erature to perform termination analysis. One of them is building a triggering 
graph by considering the type of triggering events and events generated by the 
execution of the rule actions [1,5,7]. A Triggering Graph [1] is a directed graph 
{V, E}, where each node in V corresponds to a rule in the rule base and each 
edge Ti — > Yj in E means that the action of the rule r.j generates events that 
trigger rj . If there are no cycles in the triggering graph, then processing is guar- 
anteed to terminate. The triggering graph, however, fails to take account the 
details of the interaction between the conditions and the actions of potentially 
non-terminating rules. That is, although triggering graph has a cycle, there may 
be the case that when a rule in the cycle gets triggered, the condition of that rule 
is not true. As a result, the rule is not executed and the cyclic chain is broken. 
Consider for example the following rule: 

Rl: 

ON update to attribute A of T 
IF new value A > 10 
THEN set A of T to 10 
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The triggering graph for the rule base involving Ri contains a cycle, as the action 
of Ri updates the attribute A of T, which in turn triggers Ri. However, non- 
termination does not result as the action of Ri assigns A a value for which the 
condition of Ri never becomes true. It is to overcome this limitation in triggering 
graphs that activation graphs have been introduced. 

An Activation Graph [2] is built upon the semantic information contained in 
the rule conditions and actions. It is a directed graph {V,E}, where each node 
in V corresponds to a rule in the rule base, each edge r, — > r ? (i ^j) in E means 
that the action of the rule r, may change the truth value of the condition of Tj 
from false to true, and each edge r, — > r.j means that the condition of iy may be 
true after the execution of its own action. If there are no cycles in the activation 
graph, then processing is guaranteed to terminate. Some studies [3] mostly rely 
on the Activation Graph while making termination analysis. 

Other studies [2] try to detect the potential non-termination by using both 
triggering graph and activation graph together in which a rule set can only 
exhibit nonterminating behavior when there are cycles in both the triggering 
and the activation graphs that have at least one rule in common. Returning 
to the example given above, the activation graph for rule Ri contains no cycle. 
Because its condition can not be true after the execution of its action. Thus, even 
though the triggering graph contains a cycle, execution of the rule terminates. 

While triggering graph arcs are syntactically derivable, it is very difficult to 
precisely determine the arcs of an activation graph. In [2], it is assumed that 
conditions and actions are both represented by relational algebra expressions. 
Unfortunately, for the host languages that are not based on relational algebra 
or relational calculus, it is difficult to infer the truth value of a condition from 
the imperative code of a rule’s action. 

Most of the existing studies on rule analysis only deal with simple rule lan- 
guages; i.e. languages that support only a limited set of constructs for specifying 
active behavior. If the rules become complex (like having composite events, sup- 
porting complex conditions/actions, etc.), their analysis also becomes complex. 
The reason is, there are more elements on which the triggering of a rule depends. 
For example, a rule defined on a complex event is triggered only when all com- 
ponent events occur. Therefore, compared to a simple rule language, it is more 
difficult to decide when rules may generate an infinite loop during execution. 
That is , in a system having only primitive events, an edge in the triggering 
graph indicates that one rule can generate an event that can in turn trigger 
another. However, where a rule has a composite event, it may be that no other 
single rule in the rule base can trigger , but that a subset of the rules together 
may be able to trigger . Only a very few studies consider compositions while 
performing termination analysis [6,9]. 

3.1 Termination Analysis on the FPN 

Before we check termination property of the system, we construct the FPN. In 
that, during the transition construction, we obtain the triggering graph informa- 
tion from the event generating transitions. (An event generating transition 
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is an indication of one rule triggering another.) During the termination analysis, 
once a cyclic path is detected, it may not be a true cycle due to one or more 
rules in the cycle having composite events. That is, not all composing events 
of a composite event may be triggered. We eliminate these type of false cycles 
in the termination analysis. The algorithm whose explanation and pseudecode 
given below finds the true cycles. 

An Algorithm for Determining True Cycles. We hold an m x to matrix of 
rules, which is . Each d ij entry holds the connectivity information about the 
rules. &ij entry is 1 if rule r, triggers r j, meaning there is an event generating 
transition from rj to rj. Otherwise, its value is 0. Let D k be an m x m matrix 
holding the matrix composition result of k number of D matrices. It holds the 
connectivity information at k edges (or event generating transitions ) distance. 
Assuming there is a cycle, if we have number jo f .rules rules in the rule base, 
we can have at most number .of .rules distinct event generating transitions 
to go through in our FPN. This is due to the fact that, there may be at most 
number .of .rules event generating transitions that connect these rules. If the 
i th diagonal at Dk holds a value greater than 0, this means that there is a cycle 
in k steps (or k event generating transitions distance). Notice that all rules 
taking place in that cycle have values greater than 0 in their diagonal entry. 
Comparing with the previous matrices which indicates a cyclic path, if the same 
diagonal elements having values greater than 0 are obtained at some k steps, the 
matrix composition should be terminated. Or if 0 is obtained for all entries of 
a matrix, again the matrix composition should be terminated. This means that 
all edges have been consumed and there is no further edge to go through, which 
is an indication of no cyclic behavior. 

By the time these D matrices have been calculated, a linked list of rules 
which have been gone through the path is held in the L matrix. This means that 
Zfc [*][)] holds a linked list of rules which is passed through the path from rules r,; 
to Tj. If the i th diagonal at Dk holds a value greater than 0, we can obtain the 
cyclic path elements at Ik [«][?’]. Now it is easy to find true cycles by checking the 
[i] [i] entry of Lfc. If all the rules in the path have only primitive events, this is 
certainly a true cycle. Otherwise (if any of them has a composite event) the rule 
set of the primitive events of the rules is examined. If they are all included in 
the cycle, again the cycle is a true cycle. Otherwise it is not a true cycle. If there 
is at least one true cycle, the user is informed about the situation together with 
the rules taking place inside the cycle. Then it is the user’s choice to change the 
definitions of these rules. After that, the termination analysis is repeated. 

DETERMINE_TRUE_CYCLES ALGORITHM 
BEGIN 

calculate LI 
D0{ 

FOR each place i 

{ IF Dk(ii) is >= 1 where i is the index of the rule ri 
IF rules in the Lk(ii) is distinct 

IF none of the rules contains composite event 
RETURN cyclic rules (Lk(ii)) 
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ELSE 

IF all rule sets of the primitive events of the the rules 
(which have composite event) is included in the Lk(ii) 

RETURN cyclic rules (Lk(ii)) 

ELSE 

RETURN no cycle} 

k=k+l 

calculate Dk 

calculate Lk }UNTIL (Dk = 0) DR (k > number_of _rules) 

RETURN no cycle 
END 

Example 1 . Suppose there are the following two rules: 

Rl: R2 : 

ON (ell and e21) threshold Rl ON el2 threshold R2 

IF cl IF c2 

THEN al THEN a2 

In these rules, a\ unifies with crisp event e\ (which has e\\ and ei 2 as its fuzzy 
primitive events) and a 2 unifies with crisp event e 2 (which has e 2 i as its fuzzy 
primitive event). The dependencies and the triggering graph are shown in Fig- 
ure 2. Dashed arcs show the partial triggering due to composite events and solid 
arcs show the total triggering. Figure 3 shows the FPN constructed according 
to this rule set. For this example, D\ and L\ are as follows: 





’1 r 


1 _ 


1,1 1,2’ 


II 


1 0 


11 


2,1 0 



Since Z?i[l][l] entry has the value 1, this means that there is a cycle. The rules 
taking place in the cycle are found at l [i] [i] entry, which is {1, 1}, meaning the 
rule ri . The rules that pass through the primitive event places of the rules in 
the cycle (which can be found at r\.P N -primitive-event. ruleset) , are the rules 
r\ and r 2 . Since the cyclic set {1,1} does not contain r 2 , this is not a true 
cycle. Therefore it is excluded. Then D 2 is calculated and at the same time L 2 
is obtained: 

1, 1,1; 1,2, 1 1,1,2 
2,1,1 2,1,2 

Since D 2 [l][l] entry has a value greater than 0, there is a possibility of a cy- 
cle. r\.PN -primitive -event. rule -set is the rules r\ and r 2 . Since the cyclic set 
{1,2, 1} contains both r\ and r 2 , it is a true cycle. Termination analysis returns 
the rules r\ and r 2 , which take place inside the cycle. These rules should be 
changed in order to break the cycle. 



D 2 = 



2 1 
1 1 



L 2 = 



4 Confluence Analysis 

On each execution of the scheduling phase of rule processing, multiple rules may 
be triggered. A rule set is confluent if the final state of the database doesn’t 
depend on which eligible rule has been chosen for execution. 
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Fig. 2. Dependencies and the Triggering Graph for Example 1 




Fig. 3. Fuzzy petri net of Example 1 



The rule execution process can be described by the notions of 
and . Consider a rule set R. A rule execution state S 

has two components: 1. a database state d, and 2. a set of triggered rules Rt C R. 
When Rt is empty, no rule is triggered and the rule execution state is quiescent. 
A rule execution sequence consists of a series of rule execution states linked by 
(executed) rules. A rule execution state is complete if its last state is quiescent. 
A rule set is confluent if, for every initial execution state S, every complete rule 
execution sequence beginning with S reaches the same quiescent state. Then 
confluence analysis requires the exhaustive verification of all possible execution 
sequences for all possible initial states. This technique is clearly unfeasible even 
for a small rule set. 

A different approach to confluence analysis is based on the of 

rule pairs. Two rules r j and Vj commute if, starting with any rule execution state 
S, executing r, followed by Tj produces the same rule execution state as executing 
Tj followed by rj. If all pairs of rules in a rule set R commute, any execution 
sequences with the same initial state and executed rules have the same final 
state. Then, it is possible to state a sufficient condition to guarantee confluence 
of a rule set: A rule set R is confluent if all pairs of rules in R commute [3]. 

Confluence may be guaranteed by imposing a total ordering on the active 
rule set [2]. If a total ordering is defined on the rules, when multiple rules are 
triggered only one rule at a time is eligible for evaluation. This provides a single 
rule execution sequence, which yields a unique final state, and confluence is 
guaranteed. 
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4.1 Guaranteeing Confluent Executions in Our FPN Model 

Our fuzzy inference algorithm based on our FPN is given below. In that, if the 
rules triggered are in the same fuzzy inference group, their execution is carried at 
the same time and the total outcome is computed. On the other hand, if the rules 
triggered are not in the same fuzzy inference group, total ordering is achieved 
according to their g s (r) (similarity to the current scenario) values. Therefore, 
our inference algorithm based on FPN guarantees confluent rule execution. The 
algorithm uses the following data structures: 

a. M is an nr x 1 column vector of places p,;, where i = 1 ,...,m. Each of its 
entries holds a data structure of two elements: 

- the 1 st element holds the number of tokens, (current-marking) , 

- the 2 nd element holds a linked list of current .marking many token values, 
which works with FIFO queuing mechanism (token -values) 

b. N is an n x 1 column vector of transitions t j, where j = 1, ..., n. Each of its 
entries holds a transition function. Each transition function uses the head 
elements of the input places token.values and produces an output to be 
added to the tail of the output places token.values. 

c. C + =(cJ) and C _ =(c“) represent the output and input incidence matrices 
respectively, where c,f- is 1 if there is an arc from the transition t j to place 
Pi and c~j is 1 if there is an arc from the place p^ to the transition t j and 
their values are 0 if there is no connection. 

INFERENCE ALGORITHM 
INPUT : start place Psi 

OUTPUT: a set of end places Pe as the result of the inference 
BEGIN 

find the index of the start place in M vector 
increase the number of tokens in the start place 
add a new token to the tail of the start place 

IF the rules triggered are in the same fuzzy inference group THEN 
set the active rule to any one of the rules triggered. 

ELSE 

order the triggered rules according to their scenario similarity 
set the linked list of active rules according this order 
WHILE ordered rules (i.e. linked list of active rules) 
are not consumed yet 

{ check the transitions in N where the current place 

is connected as an input in input incidence matrix cij- 
D0 for each one of them 

{ If the rules that uses the transition 

is a subset of the ordered rules’ head 
Check the other input places of the transition in cij- 
If they also have tokens to use 
Fire the transition 

Mark the output places in M using cij+ 

remove the tokens from the input places in M using cij- 
}UNTIL an end place Pex is reached 
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add the end place reached (Pex) to the set of outputs Pe 
update the ordered rules’ head to the next } 

RETURN (Pe) 

END 

5 Conclusion 

FPNs analytic capability can help with checking the properties of a system, 
which provides deeper insights into that system. In this paper, having observed 
this ability of the FPNs, we study the static analysis of our rule base. When we 
construct our FPN, we inherently also contain the triggering graph information. 
When the assumptions and the theories regarding the termination analysis were 
put forward, event and condition compositions have been neglected. Recently, 
only a limited number of studies have considered compositions. However, in these 
studies, in order to perform termination analysis together with compositions, 
extra effort is involved in dealing with complex structures and algorithms. On 
the other hand, we can handle event and condition compositions easily by the 
structures provided by our FPN. As a result, our termination analysis algorithm 
is simple and easy to understand. In addition, our fuzzy inference algorithm 
working on our FPN assures confluent rule executions. This assurance comes 
from the fact that our inference algorithm provides a total order within the 
rules using the similarity of the rules to the current active scenario. 

The techniques and solutions provided in this study can be utilized in various 
complex systems, such as weather forecasting applications and environmental 
information systems. 



References 

1. Aiken, A., Hellerstein, J., Widow, J.: Static analysis techniques for predicting the 
behavior of active database rules. ACM TODS 20(1) (1995) 3-41 

2. Baralis, E., Ceri, S., Paraboschi, S.: Improved rule analysis by means of trigger- 
ing and activation graphs. In: Proc. of RIDS’95. LNCS, Vol. 985. Springer- Verlag 
(1995) 165-181 

3. Baralis, E., Widow, J.: An algebraic approach to rule analysis in expert database 
systems. In: Proc. of VLDB’94. (1994) 475-486 

4. Chun, M., Bien, Z.: Fuzzy petri net representation and reasoning methods for rule- 
based decision making systems. IECE Trans. Fundamentals E76 (A/6) (1993) 

5. Ceri, S., Widow, J.: Deriving production rules for constraint maintenance. In: Proc. 
ofVLDB’90. (1990) 566-577 

6. Dinn, A., Paton, N., Williams, H.: Active rule analysis in the rock and roll deductive 
object-oriented database. Information Systems 24(4) (1999) 327-353 

7. Karadimce, A., Urban, S.: Refined triggering graphs: A logic-based approach to 
termination analysis in an object-oriented database. In: ICDE. (1996) 384-391 

8. Murata, T.: Petri nets: Properties, analysis and applications. In: Proc. IEEE 77(4). 
(1989) 541-540 

9. Vaduva, A., Gatziu, S., Dittrich, K.: Investigating termination in active database 
systems with expressive rule languages. Technical Report, Institut fur Informatik 
(1997) 




Protein Structural Class Determination Using 
Support Vector Machines 



Zerrin Isik, Berlin Yanikoglu, and Ugur Sezerman 

Sabanci University, 34956, Tuzla, Istanbul, Turkey 
zisik@su. sabanciuniv.edu, {berrin,ugur}@sabanciuniv . edu 



Abstract. Proteins can be classified into four structural classes (all-a, 
all-/?, a/P, a+P) according to their secondary structure composition. In 
this paper, we predict the structural class of a protein from its Amino 
Acid Composition (AAC) using Support Vector Machines (SVM). A pro- 
tein can be represented by a 20 dimensional vector according to its AAC. 
In addition to the AAC, we have used another feature set, called the Trio 
Amino Acid Composition (Trio AAC) which takes into account the amino 
acid neighborhood information. We have tried both of these features, the 
AAC and the Trio AAC, in each case using a SVM as the classification 
tool, in predicting the structural class of a protein. According to the 
Jackknife test results, Trio AAC feature set shows better classification 
performance than the AAC feature. 



1 Introduction 

Protein folding is the problem of finding the 3D structure of a protein, also 
called its native state, from its amino acid sequence. There are 20 different types 
of amino acids (labeled with their initials as: A, C, G, ...) and one can think 
of a protein as a sequence of amino acids (e.g. AGGCT... ). Hence the folding 
problem is finding how this amino acid chain (ID structure) folds into its native 
state (3D structure) . Protein folding problem is a widely researched area since 
the 3D structure of a protein offers significant clues about the function of a 
protein which cannot be found via experimental methods quickly or easily. 

In finding the 3D structure of a protein, a useful first step is finding the 2D 
structure, which is the local shape of its subsequences: a helix (called a-helix) 
or a strand (called /3-strand). A protein is classified into one of four structural 
classes , a term introduced by Levitt and Chothia, according to its secondary 
structure components: all-a, all-/?, a/P, a+P, [1,2]. An illustration of two of 
these (all-a, all-/?) is given in Figure 1. 

The structural class of a protein has been used in some secondary structure 
prediction algorithms [3,4,5]. Once, the structural class of a protein is known, it 
can be used to reduce the search space of the structure prediction problem: most 
of the structure alternatives will be eliminated and the structure prediction task 
will become easier and faster. 

During the past ten years, much research has been done on the structural clas- 
sification problem [6,7,8,9,10,11,12,13,14,15,16,17,18]. Chou [12] used the amino 
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Fig. 1 . The illustration of two structural classes. The one on the left is a protein 
composed of only a-helices whereas the one on the right is composed of what is called 
a /3-sheet (formed by strands of amino acids). 



acid composition of a protein and Malralanobis distance to assign a protein into 
one of the four structural classes. Due to the high reported performance, Wang 
et al. tried to duplicate Chou’s work using the same data set, without success 

[19] . More recently, Ding and Dubchak compare the classification performance 
of ANNs and SVMs on classifying proteins into one of 27 fold classes, which 
are subclasses of the structural classes [17]. Tan and coworkers also work on the 
fold classification problem (for 27 fold classes), using a new ensemble learning 
method [18]. 

These approaches typically use the Amino Acid Composition (AAC) of the 
protein as the base for classification. The AAC is a 20 dimensional vector spec- 
ifying the composition percentage for each of the 20 amino acids. Although the 
AAC largely determines structural class, its capacity is limited, since one looses 
information by representing a protein with only a 20 dimensional vector. We im- 
proved the classification capacity of the AAC by extending it to the Trio AAC. 
The Trio AAC records the occurrence frequency of all possible combinations 
of consecutive amino acid triplets in the protein. The frequency distribution of 
neighboring triplets is very sparse because of the high dimensionality of the Trio 
AAC input vector (20 3 ). Furthermore, one also should exploit the evolutionary 
information which shows that certain amino acids can be replaced by the oth- 
ers without disrupting the function of a protein. These replacements generally 
occur between amino acids which have similar physical and chemical properties 

[20] . In this work, we have used different clusterings of the amino acids to take 
into account these similarities and reduce the dimensionality, as explained in 
Section 2 . 

In the results section we compare the classification performance of two feature 
sets, the AAC and the Trio AAC. The classification performance of a Support 
Vector Machine with these feature sets is measured on a data set consisting of 117 
training and 63 test proteins [12] . The comparison of two different feature sets 
have proved that the high classification capacity of SVMs and the new feature 
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vector (Trio AAC) lead to much better classification results. Most work in this 
area is not directly comparable due to different data sets or different number of 
classes the proteins are classified into. We use the same data set used by Chou 
[12] and Wang et al. [19], in order to be able to compare our results to some 
extent. 



2 Protein Structural Class Determination 

We have tried two approaches to classify a protein into one of the four struc- 
tural classes (all-a, all -(3, a/(3, a+/3). A Support Vector Machine is used with 
the feature sets of AAC and Trio AAC, which incorporates evolutionary and 
neighborhood information to the AAC. 

We preferred to use a SVM as the classification tool because of its gener- 
alization power, as well as its high classification performance on the protein 
structural classification problem [21,16,17]. The SVM is a supervised machine 
learning technique which seeks an optimal discrimination of two classes, in high 
dimensional feature space. The superior generalization power, especially for high 
dimensional data, and fast convergence in training are the main advantages of 
SVMs. Generally, SVMs are designed for 2-class classification problems whereas 
our work requires the multi-class classification. Multi-class classification can be 
achieved using a one-against-one voting scheme, as we have done using the one- 
against-one voting scheme of the LIBSVM software [22]. In order to get good 
classification results, the parameters of SVM, especially the kernel type and the 
error-margin tradeoff (C), should be fixed. In our work, the Gaussian kernels are 
used since, they provided better separation compared to Polynomial and Sigmoid 
kernels for all experiments. The value of the parameter C was fixed during the 
training and later used during the testing. The best performance was obtained 
with C values ranging from 10 to 100 in various tasks. 

We used two different feature sets, the AAC and the Trio AAC, as the input 
vectors of the SVM. The PDB files were used to form both the AAC and the Trio 
AAC vectors for the given proteins [23] . After collecting the PDB files of proteins, 
we extracted the amino acid sequence of each one. The amino acid sequences 
were then converted to the feature vectors as described in the following sections. 



2.1 AAC 

The AAC represents protein with a 20 dimensional vector corresponding to the 
composition (frequency of occurrence) of the 20 amino acids in the protein. Since 
the frequencies sum up to 1, resulting in only 19 independent dimensions, the 
AAC can be used as a 19 dimensional vector. 



x = [xx X 2 ■ • • X20] , 



(1) 



where Xk is the occurrence frequency of the ktlr amino acid. 
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2.2 Trio AAC 

The Trio AAC is the occurrence frequency of all possible consecutive triplets 
of amino acids in the protein. Whereas the AAC is a 20-dimensional vector, 
the Trio AAC vector, consisting of the neighborhood composition of triplets of 
amino acids, requires a 20x20x20 dimensional vector (e.g. AAA, AAC, ...). 

We reduce the dimensionality of the Trio AAC input vector using various dif- 
ferent clusterings of the amino acids, also taking into account the evolutionary 
information. The amino acid clusters are constructed according to lrydroplrobic- 
ity and charge information of amino acids given by Thomas and Dill [20]. We 
experimented with different number of clusters: 5, 9, or 14 clusters of the amino 
acids, giving Trio AAC vectors of 125 (5 3 ), 729 (9 3 ), and 2744 (14 3 ) dimensions, 
respectively. 



3 Results 

We have measured the performance of two algorithms: SVM with the AAC and 
SVM with the Trio AAC. We have also compared our test results to another 
structural classification work which also applied the AAC feature set on the 
same data set [19]. In all these tests, we have used a data set consisting of 
117 training proteins (29-a, 30-/3, 29-a//3, 29 -a + /3) and 63 (8-a, 22-/3, 9-a//3, 
24-a + /3) test proteins [12]. 

A protein is said to belong to a structural class based on the percentage of 
its a-helix and /3-slreet residues. In our data set, the data is labeled according 
to the following percentage thresholds: 

— a class proteins include more than 40% a-helix and less than 5% /3-sheet 
residues 

— /3 class proteins include less than 5% a-helix and more than 40% /3-sheet 
residues 

— a/ (3 class proteins include more than 15% a-helix, more than 15% /3-sheet, 
and more than 60% parallel /3-sheets 

— a+/3 class proteins include more than 15% a-helix, more than 15% /3-slreet, 
and more than 60% antiparallel /3-slreets. 

Note that the remaining, less-structured parts of a protein, such as loops, are 
not accounted in the above percentages. 

3.1 Training Performance 

The term training performance is used to denote the performance of the classi- 
fier on the training set. Specifically, the training performance is the percentage 
of the correctly classified training data, once the training completes, and is an 
indication of how well the training data is learned. Even though what is impor- 
tant is the generalization of a classifier, training performances are often reported 
for this problem, and we do the same for completeness. 
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The SVM achieved a near 99.1% training performance for for both sets of 
features (96.% for (3, 100% for the rest). Not achieving a 100% separation on the 
training data is quite normal and just indicates that the data points may not be 
linearly separable in the feature space, due to the input space mapping done by 
the kernel function. 

3.2 Test Performance 

Table 1 summarizes the test performance of the classifier on the test set (63 
proteins), after being trained on the training set (117 other proteins). The AAC 
and the Trio AAC are used as feature vectors for the SVM. 

The average test performances of the SVM using the AAC and the Trio AAC 
are 71.4% and 66.6%, respectively. The performance of the SVM with Trio AAC 
feature was found to be lower compared to the AAC feature. This is likely to 
be due to the high dimensionality of the input data, compared to the size of 
the training set: if there are points in the test set which are not represented in 
the training set, they could be misclassified. In this and all the other tables, we 
report the performance of the Trio AAC using 9 clusters, as that gave the best 
results. 



Class Name 


SVM aac 


SVM TrioAAC 


all- a 


100% 


100% 


all- a 


62.5% 


62.5% 


all-d 


77.2% 


77.2% 


a/0 


100% 


77.7% 


cx-\-@ 


58.3% 


54.1% 


Average 


71.4% 


66.6% 



Table 1 . Performance of the classifier on the test set. The AAC feature and the 
Trio AAC (9 clusters) are used for the SVM 



3.3 Test Performance Using the Jackknife Method 

The Jackknife test, also called the leave-one-out test, is a cross-validation tech- 
nique which is used when there is a small data set. In the Jackknife test, training 
is done using all of the data (train + test) leaving one sample out each time; 
then the performance is tested using that one sample, on that round of train-test 
cycle. At the end, the test performance is calculated as the average of the test 
results obtained in all the cycles. This method uses all of the data for testing, 
but since the test data is not used for the corresponding training phase, the 
testing is unbiased. 

Table 2 displays the results of a Jackknife experiment using both the train 
and test sets (117 + 63), in conjunction with the AAC and the Trio AAC. 
According to this Jackknife test results, the performance of the SVM is quite 
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successful. The average classification rates are 85% and 92.7% for the AAC and 
the Trio AAC, respectively. We achieved the 92.7% classification rate using the 
Trio AAC which is constructed using 9 amino acid clusters. 



Class Name 


SVM aac 


SVM TrioAAC 


% # 


% # 


all- a 


72.9 (27/37) 


72.9 (27/37) 


all-/? 


100 (52/52) 


98 (51/52) 


a/P 


84.2 (32/38) 


94.7 (36/38) 


a+f 3 


79.2 (42/53) 


100 (53/53) 


Average 


85.0 (153/180) 


92.7 (167/180) 



Table 2. Jackknife test performance on (117+63) proteins, using the SVM with 
the AAC and the Trio AAC (9 clusters) features 

A second Jackknife test has been performed on only the 117 training pro- 
teins in order to compare our results to the previous work of Wang and Yuan 
[19], who also used the AAC feature as a base classifier. The results for both 
works are shown in Table 3. According to these results, the average classification 
performance of the SVM (using the AAC) is significantly better than the other 
work. The average classification rate of the Trio AAC (84.6%) is even better 
than that of the AAC (74.3%). 



Class Name 


Wang et.al. SVM AAC 


SVM TrioAAC 


all- a 


66.7% 


75.8% 


82.7% 


all-/? 


56.7% 


93.3% 


93.3% 


a/f3 


43.3% 


71.4% 


89.2% 


a+fi 


46.7% 


55.1% 


72.4% 


Average 


53.3% 


74.3% 


84.6% 



Table 3. Jackknife test performance on 117 proteins (the training set only). This 
experiment was done to compare our results to a previous work of Wang and 
Yuan (given on the first column), who also used the AAC feature in the Jackknife 
test on the same proteins [19]. Our results, obtained by the SVM method using 
the AAC or the Trio AAC, are given on the second and third columns 
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4 Summary and Discussion 

Despite years of research and the wide variety of approaches that have been 
utilized, the protein folding problem still remains an open problem. Today the 
problem is approached in many different directions and divided up into smaller 
tasks, such as secondary structure prediction, structural class assignment, con- 
tact map prediction etc. 

In this study, we addressed the structural classification problem and com- 
pared the performance of Support Vector Machines using the AAC and the Trio 
AAC features. The comparison of two feature sets shows that the Trio AAC 
provides 8-10% improvement in classification accuracy (see Tables 2 and 3). We 
experimented with different number of clusters, 5, 9, and 14 clusters of the amino 
acids, giving Trio AAC vectors of increasing lengths. The experiment with 9 clus- 
ters of the amino acids has the highest classification performance. The better 
performance of the Trio AAC proves our assumption: the neighborhood and evo- 
lutionary information positively contributes on the classification accuracy. We 
have also obtained better classification rates using more training data, which is 
as expected, the second Jackknife test (Table 4), using both the AAC and the 
Trio AAC features. 

In literature, there are two studies which use feature vectors similar to the 
Trio AAC on different domains; however they are on remote homology detection 
problem and amino acid neighboring effect [24,25] . We recently became aware of 
two other studies: Markowetz et al. uses feature vectors similar to the Trio ACC, 
however the idea of using amino acid clusters (to reduce dimensionality) has not 
been applied [26]. In this work, 268 protein sequences are classified into a set 
of 42 structural classes with a 78% performance in cross-validation tests. Cai et 
al. uses a Support Vector Machine as the classification method and the amino 
acid composition as feature set and report an average classification performance 
of 93%, for a set of 204 proteins [16]. However these results are not directly 
comparable to ours due to the differences in the number of structural classes or 
in the data sets. 

In summary, we devised a new and more complex feature set (Trio AAC) 
incorporating neighborhood information in addition to the commonly used amino 
acid composition information. The higher classification rates indicate that the 
combination of a powerful tool and this new feature set improves the accuracy 
of the structural class determination problem. 
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Abstract. In this paper a novel approach is proposed to present crack- 
like patterns on the surface of 3D objects. Instead of simulating the phys- 
ical processes or using texturing techniques, the vectorized crack-like pat- 
tern is used. Given a crack-like pattern, basic image processing operations 
are applied to extract the feature pattern, and the redundant vertices are 
removed in the simplification process. The pattern is transformed onto 
the surface of paraboloid bounding volume of the input 3D object and 
then projected to the object’s surface according to a set of given projec- 
tion reference points. Based on reference points generation mechanism, 
crack-like patterns can be effectively and flexibly presented on the 3D 
objects. By the proposed hardware accelerated mechanism using stencil 
buffer, the interactivity of pattern presentation can be achieved on the 

fly- 



1 Introduction 

Natural things are difficult to formulate. Crack pattern is also one of them. There 
are many methods focusing on generating crack patterns. Approximate simula- 
tion demonstrates good visual results but exhausts huge amount of computation 
time. Alternatively, texturing is used to map the crack pattern on a given 3D 
object. However, some inherent problems associated with texturing such as mem- 
ory space requirement for texture libraries, orientation and scale consistency, the 
demands of seamless texturing, and animation of cracking propagation make this 
approach not a straight and easy solution. In order to quickly generate various 
cracks on 3D objects for non-scientific purposes such as games and art, an ap- 
proach for interactive crack-like pattern presentation on 3D objects is proposed 
in this paper. 

Our approach is a non-physical approach, which means it is not going to 
simulate the cracking propagation with physical models. A feature pattern is 
extracted from various crack-like patterns ranging from natural patterns, man- 
ual drawing patterns, procedural patterns, and so forth, by using basic image 
processing operations. By tracking the pixel connectivity in the pattern image, 
the input one could be vectorized and simplified as a planar graph, called feature 
pattern. To present the crack pattern on the object’s surface, the planar graph is 
transformed onto the surface of paraboloid bounding volume of the target object 
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first. Then the transformed graph is projected on the surface of the 3D object. 
Projection is to calculate the intersections between the 3D object and the quads, 
which are formed by edges of the transformed graph on the paraboloid bounding 
volume and corresponding projecting reference points. With the proposed hard- 
ware acceleration technique, feature patterns can be presented on the surface of 
the object without additional intersection calculations. 

Briefly, we summarize the contributions of the proposed approach as follows. 
-Crack Pattern: In our approach, input is simply a 2D image. The crack pattern 
can be any crack-like patterns, such as realistic crack patterns from natural pat- 
terns, various sets of crack-like patterns from man-made patterns, and procedural 
patterns. Also, we propose a simple mechanism to efficiently obtain the feature 
crack pattern automatically. However, crack textures used in the non-physical 
approach need more labor works to have ideal ones. For the physical approach, 
simulated crack patterns are probably realistic. But the physical model used for 
simulation is computation intensive. 

-Reference Points Generation: Reference points are automatically generated ac- 
cording to the shape distribution of the target 3D object for approximating a 
suitable projection. With three user-controlled parameters, user can interactively 
specify the distribution of reference points to produce the desired 3D object’s 
crack pattern presentation on the fly. 

-Rendering: Interactive presentation and control of feature pattern on the sur- 
faces of 3D objects are feasible by using hardware-accelerated mechanism. The 
feature pattern can be carved on the 3D object’s surface to change the visual 
geometric appearance of object. 

The rest of paper is organized as follows. Some related works are surveyed 
in Section 2. In Section 3, we present the framework and describe each stage in 
the framework specifically. The experimental results are illustrated in Section 4. 
Finally, we conclude the proposed approach and point out some future works. 



2 Related Work 

There are two classes of research on cracks. One is the physical approaches. 
These approaches propose realistic models but are largely restricted by huge 
computation time and often result in poor images. The proposed simulations 
focus on the models of energy, forces, and structure of fractures, such as [1,2] 
using finite element method, [3] studying crack pattern of mud, [4] using spring 
network, and [5] animating fracture by physical modeling. 

Another class is the semi-physical or non-physical approaches. They focus on 
the lower computational propagation of cracks or visual effects. Semi-physical 
approaches simplify the stress model of objects and use simple rules to simu- 
late the propagation of cracks. For example, Gobron and Chiba [6] simulated 
the crack pattern based on a special structural organization and rules of prop- 
agation. Non-physical approaches analyze the crack patterns or sample a lot of 
images, and use some methods such as texture mapping techniques to render the 
crack patterns. Dana et al. [7] and Chiba et al. [8] could display crack patterns 
realistically using texture mapping. Neyret and Cani [9] produced predefined 
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Fig. 1. Conceptualized framework 



crack pattern on 3D object using texturing. This approach created almost seam- 
less 3D crack pattern. Soler et al. [10] could synthesize textures on arbitrary 
surface from only one texture pattern such as crack pattern. Wyvill et al. [11] 
generated and rendered cracks in batik on clothes for art. 

3 Pattern Extraction and Presentation 

The framework of crack-like pattern presentation is conceptualized in Figure 1. 
It consists of the following stages: 

— Vectorizing the input 2D crack-like pattern and generating the corresponding 
planar graph, called feature pattern, 

— Transforming the planar graph onto the surface of paraboloid bounding vol- 
ume of the target 3D object, 

— Generating a set of references points according to the shape distribution of 
the target 3D object, 

— Projecting feature pattern on the 3D object’s surface to corresponding ref- 
erence points, 

— Interactively modifying projection parameters and rendering on the fly. 



3.1 Vectorizing Crack-Like Pattern 

Various crack-like patterns can be input images. For example, they can be nat- 
ural patterns, such as tree branches, veins of leaves, fracture of rocks, dry mud, 
and china. They can be manual drawing patterns, such as artistic patterns and 
maps. Even procedural patterns, such as Voronoi diagrams, Koch curves and 
fractal patterns also can be inputs. Basic image processing operations [12], like 
bi-leveling, segmentation, and thinning mechanism [13], are applied to extract 
the raw feature pattern which consists of discrete pixels. Then a simple greedy 
algorithm is proposed to construct planar graph (feature pattern) from the raw 
feature pattern (thinned input pattern). 





Flexible and Interactive Crack-Like Patterns Presentation on 3D Objects 



93 







BE 
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Fig. 2. Vectorizing crack-like pattern, (a) A sample input image, (b) The raw 
feature pattern (thinned image), (c) A vectorized planar graph shown by asso- 
ciating width with each edge. Note that each edge is randomly assigned with a 
color 



The feature extraction process consists of two stages. In the first stage, pixels 
in the raw feature pattern are grouped into short and straight line segments, 
called raw edges. This grouping process starts from seed pixels with degree > 3, 
and then for seed pixels with degree 1 and 2. Pixels in a raw edge (line segment) 
are all adjacent and with degree 2 except two end pixels, and the number of 
pixels, i.e., the length of a raw edge, is bounded. These raw edges form the raw 
planar graph which is usually full of tiny edges and needs to be simplified. 

In the stage 2, redundant edges are removed. Let V q be a vertex with degree 
2 and E pq and E qr be two edges connected to V qi edges E pq and E qr can be 
substituted as edge E pr if the bending angle between these two edges is less 
than a given threshold. All vertices in the raw graph are checked to see if the 
edge substitution mechanism is applicable. 

After the vectorization, some attributes, like edge width, vertex degree, user- 
defined values, etc., can be associated with vertices and edges of the graph for 
facilitating the presentation of crack pattern even the animation of cracking 
propagation. An input crack-like pattern, a raw feature pattern, and the corre- 
sponding planar graph is shown in Figure 2. It shows that the crack-like pattern 
reconstructed from the vectorized graph is similar to original input image. 



3.2 Graph Transformation 

To wrap the 3D object in the planar graph, G = (V, F), the planar graph needs 
to be transformed to a 3D object enclosing the target object. This idea is similar 
to the paraboloid mapping [14]. Prior to undergo the graph transformation, we 
set up a normalized coordinate system for all vertices in V so that the origin 
is at the center of the planar graph. Then all normalized vertices v(s,t) in G, 
where s,t £ [—1, 1], are transformed onto the surface of the paraboloid bounding 
volume of the target object with a rotation vector U(u,v), where u,v £ [—4,4], 
as shown in Figure 3, by 
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(a) Paraboloid (b) Paraboloid bounding volume (c) Edge projection 
Fig. 3. Graph transformation and edge projection 



where r is the radius of bounding sphere of the target object; u and v are 
rotation factor against axes Y and X respectively which indicates where the 
feature pattern should be projected. The paraboloid bounding volume is com- 
posed of two co-mirroring paraboloids. Figure 3 shows an example of paraboloid 
transformation . 

Many mechanisms can be employed to perform this transformation. The 
proposed paraboloid transformation is simple but effective, and preserves angles 
for almost all edges, i.e., less distortion except a few edges near the corner regions. 
Note that there is no guarantee that the projected cracks-like pattern on a 3D 
object is the same as its input image. The degree of distortion depends on the 
projection and how the target object is fit to the paraboloid. 



3.3 Generating Reference Points and Projecting Crack-Like Pattern 

Now, all edges of the planar graph are transformed into the surface of paraboloid 
bounding volume. These edges are then mapped onto the target object by pro- 
jecting them to the surface of the object. To conduct the projection, a set of 
reference points is used. For uniform distributed projection, reference points are 
generated according to the shape distribution of the target object: we use the 
vertex mean of the target object, M(x m , y m , ~ m ), vertex variances in three main 
axes, {x v , y v , z v j, and a user-controlled weight W(w x ,w y ,w z ) to generate ref- 
erence points for approximation of proper shape distribution. Each normalized 
vertex u,(s,f) € V generates a reference point pi(x,y,z) with the same rotation 
vector U in the graph transformation by 
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Eventually, the projected edge is the intersection between the object’s facets 
and the quad formed by two endpoints of the projecting edge and the corre- 
sponding reference points, as shown in Figure 4(a). The user-controlled weight 
W dominates the result of intersection: if W is (0,0,0), then all edges in E are 
projected toward M, the center of target object, as shown in Figure 4(c); if W is 
anisotropic, it will lead different deformation of projected pattern, as shown in 
Figure 4(d). If all reference points are inside the target object, projected pattern 
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(a) (b) (c) (d) 



Fig. 4. Projection: (a) A quad is formed by an edge e(vi, V 2 ) and the correspond- 
ing reference points ri and r 2 . Edge e' is the intersection between the object and 
the quad, (b) An octahedron is constructed by an edge e(vi, V 2 ) and four jittered 
vertices {rq , r\ , r 2 , r 2 } of the reference points ri and r 2 - Edge e! is the area in 
green that will be rendered by graphics hardware, (c) Projecting a 6 x 6 regular 
grid on a ball to the center of object, (d) Projecting a 6 x 6 regular grid on a 
ball with W=(2fi,0). 



will have the same topology as the feature pattern; otherwise, some edges or 
some portions of feature pattern will not appear on the surface of the target 
object. The projection is not distortionless, but the intuition between input 2D 
crack-like patterns and presented 3D cracks on objects is high for most objects, 
especially for convex-like objects. 



3.4 Pattern Presentation 



After constructing quads for all pairs of edges and reference points, the intersec- 
tions between quads and the object need to be calculated for obtaining projected 
edges. However, computing the intersections is time consuming. Instead, a mech- 
anism with hardware acceleration to quickly visualize the intersections and make 
the interactive control feasible is proposed. Once the view, feature pattern, tar- 
get object, graph transformation or distribution of reference points is changed, 
the projected crack pattern will be re-rendered on the fly to get the correct visual 
result of intersections. We exploited the stencil buffer. The target 3D object is 
rendered into only z-buffer first, and then both Z-buffer and color buffer writing 
are temporally disabled. Then the stencil buffer is enabled, and initialized by 



stencil (x) 



2, if x £ proj(obj) 
0, otherwise 



where proj(obj) is the projected area of the target object. As Figure 4(b) shows, 
for each quad, an octahedron constructed by a projecting edge and four jittered 
points of the two reference points along the normal direction of the quad is 
rendered for updating the stencil buffer. For each pixel being rasterized, if its 
stencil value is greater than or equal to 2 and its z-value is smaller than the 
value in z-buffer, the LSB of its stencil value inverts. This can be implemented 
by increasing stencil value with a writing mask that all bits are 0 except the LSB. 
After the above processing, cracks are rendered in the stencil buffer where the 
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(a) (b) (c) (d) (e) (f) 



Fig. 5. Six crack-like patterns: (a) a natural Aluminum Oxide crack pattern, 
(b) another natural Aluminum Oxide crack pattern, (c) a natural mud crack 
pattern, (d) a crack-like pattern captured from a real cup, (e) a map pattern, 
(f) a Voronoi diagram pattern. Patterns (a) and (b) are obtained from website 

http : / /www . physics .utoronto . ca/nonlinear/gallery . html 



stencil values are 3. To show carving effect on the surface of the target object on 
the frame buffer, we just need a few simple steps. First, we clear the z-buffer and 
set the stencil function to test for equality to 2 and set the stencil operations to 
do nothing. Second, we draw the object again, so the object is rendered on frame 
buffer except pixels whose stencil values are 3. Then we set the stencil test again 
for equality to 3, and draw the object with a different material and a scaling value 
s(s < 1, s ss 1). Then the smaller object will appear on where the cracks should 
be. We can not only control how the material of cracks is, but also change the 
geometric appearance of the target object. Because the constructed octahedron 
is closed, the projections of its front faces and back faces may completely overlay 
each other in stencil buffer. In our method, only the fragments that are visible 
and bounded by the tetrahedron will be marked as 3 in the stencil buffer. The 
entire process can be described as a function: 



stencil(x) = 



M 2(I2 t ET V ( t ’ X ')')' if x € P r °j(° h j) 
0, otherwise 



and 

, . _ f 1, if x € proj(obj) and Z t (x) < Z Buffer (x) 

' ,x ‘ j^O, otherwise 

where a; is a pixel in the stencil buffer, t is a triangle of octahedron T, proj(obj) 
is a function projecting a 3D object obj onto the screen space, M 2 Q is a function 
to module input by 2, Z t {x) is the z value for a rasterized triangle t at pixel x, 
and Z Bu ffer(x) is the z value at pixel x in z-buffer. 

This idea is similar to a mechanism used in the shadow volume [15], but our 
technique requires only a 2-bit stencil buffer. The advantages of the proposed 
mechanism are (1) an efficient hardware acceleration method without z-fighting 
problem, (2) easy to dynamically change or reposition the crack pattern and 
adjust width of projected crack edges, and (3) independent of view, which means 
there is no need to determine the front or back face for each triangle, and triangles 
can be rendered in an arbitrary order. There is a drawback that some insignificant 
errors may occur between connections of crack edges caused by over flipping at 
overlapped areas. 
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(d) (e) (f) 



Fig. 6. Presenting crack-like patterns on 3D objects: (a) an Aluminum Oxide 
crack pattern presented on bunny, (b) another Aluminum Oxide crack pattern 
presented on dragon, (c) a mud crack pattern presented on turtle, (d) a captured 
crack-like pattern presented on teapot (e) a map pattern presented on vase, (f) 
a Voronoi diagram presented on teacup 



4 Experimental Results 

There are many crack-like patterns that can be used to present on 3D objects. 
As shown in Figures 5 and 6, two Aluminum Oxide crack patterns are presented 
on bunny and dragon, one crack pattern of dry mud is presented on turtle, one 
crack-like pattern captured from a real cup is presented on teapot, one map 
pattern is presented on Chinese utensil, and one Voronoi diagram pattern is 
presented on teacup. The fragments of surface where cracks are on are cut out 
so that the surface geometry is displaced to reveal carving effect, as shown in 
Figure 8(a). As experimental results show, a presented pattern will look like real 
if the crack-like pattern is from real natural one. 

For convex-like object, one reference point is enough to get good result. How- 
ever, for the model with irregular shape like a dragon, only one reference will 
result in a highly distorted crack pattern, as shown in Figure 7(a). To estimate 
how many reference points are needed and to where should they be projected 
is not easy, we propose the automatic reference points generation. Using the 
mechanism with some user interaction, we can get a visually good crack pattern 
on an arbitrary object, as shown in Figure 7(b). 

Although we don’t need to compute the 3D information of the projected 
feature pattern, we might have better visual effect by using the 3D information 
of the projected feature pattern for lighting the surface around the crack edges 
to produce more rich bumping or carving effects on 3D object, as shown in 
Figure 8(b) and 8(c). Besides, because the feature pattern consists of graph, it is 
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Fig. 7. Comparison of reference points selections for irregular shaped object: (a) 
Using only reference points at center of the object, (b) Using automatic reference 
points generation with a little user interaction, where W = (1.5, 1.3,0). Both of 
them use the crack pattern in Figure 5(a) as input 




Fig. 8. Visual effects: (a) Geometric carving effect by using stencil technique. 
Right-bottom corner of the image enlarge the area in orange rectangle for show- 
ing carving effect, (b) Bumping and (c) carving visual effects in our previous 
work by changing luminance around the crack edges of 3D graph constructed 
from the 3D information of projected feature pattern [16] 



feasible to demonstrate the animation of cracking propagation on the surface of 
3D objects with proper traversal schemes. Figure 9 shows a cracking animation 
simulating a cracking propagation of drying mud due to high temperature based 
on edge priority based traversal scheme. 

5 Conclusion 

In this paper, we introduce an approach for interactive crack-like pattern pre- 
sentation on the surface of 3D objects. The proposed approach is an automatic 
process from processing images of various crack-like patterns, constructing graph 
from feature crack patterns, for presenting the graph of feature crack patterns 
on the surface of 3D objects with a set of corresponding reference points. Due 
to hardware stencil buffer, a feature pattern can be rendered on the 3D object’s 
surface without expensive intersection computation. Interactive rendering pro- 
vides a quick snapshot of a set of specific reference points and the feature pattern 
setting. 
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(a) (b) (c) (d) (e) (f) 

Fig. 9. Six representative snapshots selected from an animation of priority based 
traversal scheme in our previous work [16] 



In the near future, the presentation of crack pattern on a 3D object will be 
improved. The visual effect of cracks can be presented more realistically by ex- 
ploring an illumination model of crack. Furthermore, we consider re-tessellating 
the object according to the crack pattern for demonstrating an animation of 
object breaking or peeling effect. 
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Abstract. Occlusion culling has been studied extensively in computer 
graphics for years. In this paper, an occlusion culling approach based on 
exploiting multiple hardware-accelerated occlusion queries using the con- 
cept of eye-siding number for dynamic scene is proposed. Organizing the 
regular grid with overlapping voxels for the scene as an octree-like hierar- 
chy, the actual position of dynamical objects can be updated efficiently. 
Based on the eye-siding number, the nodes occlusion front-to-back or- 
der enumeration can be done efficiently and the number of parallelizable 
occlusion queries for nodes in the hierarchy while traversing can be max- 
imized efficiently and effectively. As experimental results shown, for all 
frames of the test walk-through in a dynamical environment, our ap- 
proach does improve the overall performance. 



1 Introduction 

Occlusion culling (OC) has been studied extensively in computer graphics in re- 
cent years. In this paper, we propose an OC approach based on exploiting multi- 
ple hardware-accelerated occlusion queries (HaOQs) for the dynamical scene in 
a hierarchical representation. In our approach the space of the scene is divided 
into a regular grid with overlapping voxels and organized as an octree. In each 
frame, we efficiently update the positions of dynamic objects and build the ob- 
ject list for each voxel. Then, the hierarchy traversal proceeds in an occlusion 
front-to-back (ftb) order and hidden nodes are culled away according to the 
results of parallelized occlusion queries (OQs). Finally, objects contained in a 
visible voxel are rendered. The proposed approach exploits the eye-siding num- 
ber of the nodes in the hierarchy so that we are capable of efficiently providing 
an occlusion ftb order while traversing and effectively maximizing the number 
of parallelizable OQs for nodes in the hierarchy. 

Below we summarize contributions of our approach. 

-Efficient Hierarchy Construction and Update: Hierarchical spatial data struc- 
tures have been used for accelerating OQs. The cost of hierarchy construction 
and update for dynamical objects is high. In this paper, we use the regular grid 
with overlapping voxels to uniformly partition the scene. With the regular grid 
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the octree-like hierarchy is simple and rapid to construct, and with the overlap- 
ping voxels the hierarchy update for dynamical objects can be done efficiently. 
-Fast Occlusion Front-To-Back Order Enumeration: While traversing the hier- 
archy for OC, an occlusion ftb traversal order improves the performance. We 
propose an efficient scheme to provide an occlusion ftb order using the eye- 
siding number of nodes in the hierarchy. The concept of the eye-siding number 
can be further explored to classify nodes into parallel units for multiple HaOQs. 
-Maximizing the Utilization of HaOQs: The visibility culling will be tending to 
use the HaOQ. However, setting up and waiting for the OQ stalls the rendering 
pipeline. The more multiple queries sent for occlusion evaluation at a time, the 
better performance can be gained. Nodes with the same eye-siding number in 
an occlusion ftb order sequence can be grouped into a parallel unit and can be 
sent for occlusion evaluation at a time. Maximizing the number of parallelizable 
OQs for nodes in the hierarchy speeds up the performance. 

There are three approaches similar to our method. Govindaraju et al. [8] 
switched roles of two GPUs for performing OC in parallel between successive 
frames. The parallelism of OQs is exploited by sending all possible OQs for 
the nodes at a given level in the hierarchy at a time. However, they are not 
guaranteed in an occlusion ftb order, and the occlusion representation from the 
previous frame may not be a good occlusion approximation for the current frame. 
Hillesland et al. [11] decomposed the static scene using uniform grid and nested 
grid and made use of HaOQ to evaluate the visibility in ftb order determined 
by a variant of the axis aligned slabs. To reduce the setup cost, the pipeline 
is kept busy by submitting n cells in a slab at a time, and recursively traverse 
the contained subgrids of a visible cell. This method is simple and fast. But, 
too many OQs are sent for visibility evaluation in the scene represented by the 
uniform grid. Also, it is less effective on reducing the pipeline stalls that multiple 
OQs are only selected from a single subgrid of a visible cell for the nested grid 
traversal. Staneker et al. [22] proposed the software-based Occupancy Map to 
significantly reduce the overhead of HaOQs and to arrange multiple OQs in a 
static scene browsing. The proposed method is useful for scenes with low occlu- 
sion. However, the screen space bounding rectangle is too conservative such that 
it tends to obtain low occlusion effectiveness, especially in a dense environment. 

The rest of this paper is organized as the follows. In Section 2, a survey of 
OC is presented. The proposed methods are specified in Section 3. In Section 4 
we show the experimental results. Finally, the conclusion and future works are 
given in Section 5. 



2 Related Works 

A recent survey of OC is given in [6]. OC algorithms, [4,5,7,10,13,16,19,20,24], 
are conservative. A few approaches [2,14,21] were proposed to approximate the 
rendering. The approximation technique sacrifices visibility conservativeness for 
the performance and simplicity of implementation. Region-based visibility tech- 
niques [7,16,17,20,24], compute the visibility and record visible objects in each 
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region. In rendering, the view-cell where the viewpoint locates is found and 
its visible objects are rendered. Point-based visibility methods [1,4,5,14], rely- 
ing on the identification of large occluders, compute the visibility on the fly. 
Region-based techniques work well for the scene with large convex objects as 
occluders and benefit viewpoint coherence. However, they take long preprocess- 
ing time, require large storage space, and result in low culling effectiveness. 
Point-based methods address moving objects but with less effective in occlusion 
fusion. Object-based approach [5] evaluates the occlusion by comparing the oc- 
clusion volumes formed with raw 3D objects. These approaches utilize spatial 
hierarchies, but they suffer from performing occlusion fusion for small occluders. 
Projection-based schemes [1,3,4,14,20] evaluate occlusion by testing the pro- 
jected region of objects to the maintained occlusion information. Approaches of 
analytic visibility [10,13,16] exploit the geometry information of special domains 
to determine the visibility. Projection-based and analytic approaches can fuse 
the occlusion in their space of the overlap tests. 

Sudarsky and Gotsman [23] update dynamic objects using temporal coher- 
ence only for potentially visible objects and expired TBV (Temporal Bounding 
Volume). The output-sensitivity is provided, but assumes that the motion of 
objects is predictable. Besides, the update of hierarchies may be still too expen- 
sive. Batagelo and Wu [3] adapted [20] to dynamic scenes using [23]. The scene 
is discretized into voxels which maintain volumetric characteristics: occluder, 
occlusion, identifiers and TBVs matrix. To reduce the spanning voxels computa- 
tion, TBVs is used for hidden dynamic objects. Batagelo can take care of truly 
dynamic environments and output-sensitivity is provided. Although the voxel 
traversal approximates the ftb order, it cannot exploit the advantages of hierar- 
chy schemes like methods given in [5,12,16,23]. In a densely occluded scene, this 
may result in more traversals. Algorithms proposed in [2,8,9,11,15,18,22] eval- 
uate the occlusion by performing the HaOQs. These techniques do have faster 
performance if the bounding boxes contain a large number of objects, and the 
effectiveness of OQ depends on the underlying hardware and input models. 

3 Proposed Approach 

3.1 Scene Organization and Dynamic Object Update 

A spatial data structure, regular grid, is used to organize the scene so that the 
construction simply computes the object list for each cell (axis-aligned voxel). 
The size of voxel is set as a multiple of the average size of objects’ bounding 
boxes which are the majority in a scene so that the voxel can contain several 
objects. If most of objects are small but a few are large, we divide large objects 
into small ones to increase the probability of considering them as hidden. 

So far, solutions for handling cross-node objects might not be feasible for dy- 
namic objects, require large memory space when the cross-node object increases, 
or suffer from low OC effectiveness. To address cross-node objects, we concep- 
tually extend every voxel up to a given constrained size in each axis’s positive 
direction such that each object being located in a node can be fully contained 




An Occlusion Culling Approach 103 



in the extended voxel, called overlapping voxel. The constrained size is the max- 
imum of the dimensions of bounding box of dynamic objects in majority. Of 
course, it is smaller than the size of voxel which is set to contain several objects. 

To minimize the update time for dynamic objects, the scene is scaled such 
that each voxel is a unit cube so that the minimal vertex of the bounding box 
of an object can be used to represent its position. The object’s owner voxel is 
simply determined using the integral parts of its position. Let the minimal vertex 
of the bounding box of an object be (1.23, 8.0, 7.96), then the owner voxel is 
indexed by (1, 8, 7), and the object is inserted into the object list of the voxel. 
With the overlapping voxel, every dynamic object can be exactly assigned to a 
voxel. Besides, scaling voxel to a unit cube speeds up the dynamic object update. 

Organizing the space of a scene as a hierarchy makes traversal efficient. We 
build a hierarchy based on the voxels and render the scene while traversing. In 
the special case, the grid can be treated as an octree. In general cases, dimensions 
of the grid are arbitrary. We can treat a grid as an octree-like hierarchy. The 
root of the hierarchy holds the whole grid. The subdivision starts from the root, 
and proceeds to partition from the spatial median along each axis. 

If there is only one voxel left in an axis, the partition stops in the axis. If 
there is no divisible axis, the division terminates and the node is a leaf. 



3.2 Occlusion Front-to-Back Order 

With ftb order, distant nodes tend to be hidden so the number of visited nodes 
can be reduced. Although Bernardini et al. [4] determines the ftb order by 
looking up a pre-constructed table based on the viewpoint, the look-up result 
might not be further used in maximizing the number of parallelizable OQs in 
our approach. 

To efficiently enumerate an occlusion ftb order for the nodes, octants in a 
node are encoded using 3-bit codes. The bits represent the partition planes, 
orthogonal to x, y, and z axes, respectively. A bit is set if the octant is in the 
positive half-space of the corresponding partition plane. Figure 1(a) shows an 
example of the encoded octants. Then, we sort the octants into an occlusion ftb 
order, Oo , O i, ..., O7, by the eye-siding number. The eye-siding number indicates 
how many times the node lies at the same half-space of partition planes with 
the viewpoint. The 3-eye-siding octant, Oo, containing the viewpoint is the first 
node. The 0-eye-siding octant, O7, which is not at the same half-space with the 
viewpoint for all partition planes, is the last node. Three 2-eye-siding octants, 
Oi, O2, and O3, which are at the same half-space with the viewpoint with respect 
to two partition planes out of three partition planes, are the second order, and the 
three 1-eye-siding octants, 04,0.5, and 06, which locate at the same half-space 
for one partition plane out of three partition planes, are the third order. The 
proposed algorithm for occlusion ftb order enumeration is described as follows: 

DetermineFront2Back0rder (Node) // Node: an internal node of the octree 
{ SetBit= 1; 

for i in {x, y, z} { 

if (Ei > Node . Centeri) {// E: eye position, Node. Center: center of the node 
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Fig. 1. (a) An encoded octants. The gray octant is in the positive half-space of 
y axis but in negative one of both x and z axes, (b) An occlusion ftb order for 
the viewpoint located in octant Oil. (c) The eye-siding number for four children 
in a quad-tree node 



eyesidei = SetBit; // eyesidei: indicate the eye side for three axes 
oppsidei =0; // oppsidei: indicate the opposite side 

> 



else { 

eyesidei = 0; 
oppsidei = SetBit; 

> 

SetBit = SetBit « 1 ; 


// «: shift left bitwise operator 


}// end of for 


i in {x, y, 


z> 


00 = eyesidex 


1 eyesidey 1 


eyesidez; // | : bitwise OR operation 


01 = eyesidex 


1 eyesidey 1 


oppsidez; 


02 = eyesidex 


1 oppsidey 1 


eyesidez ; 


03 = oppsidex 


1 eyesidey 1 


eyesidez ; 


04 = eyesidex 


1 oppsidey 1 


oppsidez ; 


05 = oppsidex 


1 eyesidey 1 


oppsidez ; 


06 = oppsidex 


1 oppsidey 1 


eyesidez ; 


07 = oppsidex 


1 oppsidey 1 


oppsidez ; 


return 0 ; // 


0: the front -to-back order array 



}// end of DetermineFront2Back0rder (Node) 



Figure 1(b) shows an occlusion ftb order for the viewpoint in octant Oil. Let 
Node.Center be (0, 0, 0), the eyeside and oppside vectors are (1, 2, 0) and (0, 0, 
4) respectively. Therefore, an occlusion ftb order, i.e., On, O i, O 2 , O 3 , O 4 , O 5 , 
O e , Or, is O 0 = 1 1 2 10 = 011 = 3, Oi = 1|2|4 = 111 = 7, 0 2 = 1|0|0 = 001 = 1 , 
0 3 = 0|2|0 = 010 = 2, O 4 = 1|0|4 = 101 = 5, 0 5 = 0|2|4 = 110 = 6, 0 6 = 
0|0|0 = 000 = 0, and O 7 = 0|0|4 = 100 = 4. Notice that the order of selection 
of partition plane will influence the produced node sequence of ftb order, but 
the actual occlusion order is equivalent. It’s easy to adapt this algorithm to the 
hierarchies of voxels with arbitrary dimensions, not just with power of 2 . 

While traversing the hierarchy in an occlusion ftb order, the visibility of each 
node is evaluated using the HaOQ for the bounding box of node which consists 
of its descendant overlapping voxels. If a node is invisible, its recursion stops. If 
a visible leaf node is found, its contents have to be rendered. 

3.3 Maximizing Parallelizable Occlusion Queries 

While using HaOQ, the parallelism must be exploited to reduce the stalling. 
To maximize the number of parallelizable OQs while traversing the hierarchy, 
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Fig. 2. A 2D example of the parallel unit hierarchy, (a) A node hierarchy where 
nodes A and D are partitioned into four subnodes respectively, (b) The children 
of the root node are grouped into three parallel units by the eye-siding number, 
C, A, D, and B. Similarly, the parallel unit A, D can be further divided into 
three parallel units: {A 3 , D 3 }, {Ai, A 4 , Di, £> 4 }, and {A 2 ,D 2 }. (c) A ftb order 
of parallel unit { A , D} 



the nodes are divided into parallel units. The eye-siding number encoded for 
octants of a node benefits the exploration of parallelizable nodes. The nodes 
with the same eye-siding number are parallelizable because the rendering order 
doesn’t affect the occlusion result. Hence, the eight child octants of a node can be 
classified into parallel units by their eye-siding number. There are four parallel 
units for octants in a node. The 3-eye-siding unit includes only one octant in 
which the viewpoint lies. The 0-eye-siding unit includes only one octant that is 
at the opposite side of the 3-eye-siding octant. The 2-eye-siding unit has three 
octants which locate at the eye side of two partition planes out of three partition 
planes. The 1-eye-siding unit also has three octants which locate at the eye side 
of one partition plane out of three partition planes. Figure 1(c) shows a quad-tree 
case. Node C is 2-eye-siding because it contains the viewpoint. Both node A and 
D are 1-eye-siding since this two nodes are at the eye side of x and y partition 
planes respectively. Node B is not at the eye side for two partition planes, so it 
is 0-eye-siding. Hence, there are three parallel units; {C}, {A, D}, and {B}. For 
each parallel unit the eye-siding numbers of its descendent are determined by 
their corresponding partition planes respectively, and all children in a level can 
be classified into parallel units. For example, as shown in Figure 2, the children 
nodes of parallel unit {A,D} are divided into {A 3 , D 3 }, {A\, A 4 , Di, D 4 }, and 
{A 2 , D 2 } parallel units according to their eye-siding number. 

During the hierarchy traversal, the parallel units are examined one-by-one re- 
cursively rather than visiting the nodes. The parallel unit on the top of the scene 
graph contains the root node only. For each visited unit, nodes in it is evaluated 
using parallel OQs, and the hidden ones are excluded in further processing. The 
children of the visible nodes are classified into four eye-siding sets, parallel units: 
Pi, i = 3,2, 1,0. These four units are recursively traversed in depth first order 
until the leaf node is reached. Eventually, the number of parallel queries is not 
unlimited due to the hardware implementation. If the number of the paralleliz- 
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able OQs exceeds the bound, then the queries are divided into multiple passes. 
The algorithm of multi-pass parallel OQ is described as follows: 

MultipassParallelOcclusionQuery (pNodes)// pNodes: the set of parallelizable nodes 

{ 

V= empty; 

While (pNodes is not empty) { 

Get nNodes={Nodel , Node2,..., NodeN} out of pNodes //N:the max. no. of parallelizable OQs 
// switching hardware setting 
UpdateColorBuffer (DISABLE) ; 

UpdateDepthBuffer (DISABLE) ; 

// sent n(<= N) nodes for occlusion query at a time 
for i = 1 to Cardinal ity (nNodes) { 

BeginOcclusionQuery (Qi) ; // Qi is the occlusion query identifier for Nodei 
Draw(nNodes .Nodei) ; 

EndOcclusionQuery (Qi) ; 

> 

UpdateColorBuffer (ENABLE) ; 

UpdateDepthBuffer (ENABLE) ; 

// get the pixel count of occlusion evaluation back 
for i = 1 to Cardinality (nNodes) 

V= V union (RequestPixelCount (Qi)> 0 ? Nodei : empty); 
pNodes= pNodes- nNodes; 

}// end of while (pNodes is not empty) 
return V; // V: the set of visible nodes 
}// end of MultipassParallelOcclusionQuery (pNodes) 



Furthermore, we must keep the ftb order while OQs are requested for par- 
allel units. A ftb order for children of a node is determined using procedure 
DetermineFront2BackOrder(). The returned array O contains an occlusion ftb 
order of the children nodes, and elements in array O reveal the eye-siding order. 
Namely, the parallel unit traversal sequence, P 3 = Oo,P 2 = Oi, 02 ,C> 3 , P\ = 
O 4 ,O 5 l O 6 ,P 0 = O 7 , is in ftb order. Figure 2 shows an example of the hierar- 
chy of parallel units and their occlusion ftb order. We summarize the traversal 
scheme with ftb order as follow: 

ParallelUnitOcclusionQueryTraversal(P) // P: the parallel unit 

{ 

P3= P2= Pl= P0= empty; // Pi: i-eye-siding parallel unit 

V = Mult ipassParallelOcclusionQuery (P . Nodes) ; // V: the set of visible nodes 
while (Nodei in P. Nodes && Nodei in V && Nodei != Leaf Node) { 

// determine the front -to-back order for children of Nodei 

0= DetermineFront2Back0rder (Nodei) ; // 0: the front -to-back order array 

// insert children of Nodei with eye-siding number i into parallel unit Pi 
P3= P3 union {Nodei . ChildOO}-; 

P2= P2 union {Nodei .ChildOl , Nodei .Child02, Nodei . Child03> ; 

Pl= PI union {Nodei .Child04, Nodei .Child05, Nodei .Child06>; 

P0= P0 union {Nodei .Child07}-; 

} 

ParallelUnitOcclusionQuery (P3) ; 

ParallelUnitOcclusionQuery (P2) ; 

ParallelUnitOcclusionQuery (PI) ; 

ParallelUnitOcclusionQuery (P0) ; 
return; 

}// end of ParallelUnitOcclusionQueryTraversal(P) 



4 Experimental Results 

We have implemented and tested our approach on a PC, running Windows XP, 
with one CPU, P4 2.4G, and 1GB main memory. The graphics card is based 
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Fig. 3. The statistics of frame rates in an walk-through for the proposed ap- 
proach using parallel and non-parallel OQs, Hillesland’s, and Z-buffer 



on the chipset of GeForce 4 Ti 4400. All the scenes are rendered by Gouraud 
shading with one directional light at screen resolution 1024x768 pixels. 

To show the overall performance of our approach for an interactive walk- 
through, we constructed a scene consisting of one million objects (778,497,348 
polygons), of which one half is dynamic and other half is static. Objects consist 
of static type: torus knot (1920 polygons), hose (920 polygons), and hollow box 
(135 polygons) and dynamic type: teapot (1024 polygons), torus (576 polygons), 
star (96 polygons). The 16x16x16 voxels are used to represent the scene, and the 
grid is scaled such that all voxels in the grid are unit cubes for speeding up 
the position update for dynamic objects. A snapshot of the test walk-through 
environment is shown in Figure 4. The initial position and velocity of dynamic 
objects are generated randomly. While objects are in dynamics, the collision 
detection is only performed for objects against the scene boundary to prevent 
objects from moving away. If an object collides the boundary, a new velocity is 
assigned. 

Figure 3 shows the frame rate statistics of proposed approach using parallel 
OQ, non-parallel OQ, Hillesland’s method [11], and Z-buffer. We implemented 
the nested grid decomposition version of Hillesland’s method to compare with 
our approach. For all frames of the walk-through, it shows that the performance 
of using parallel OQ is the best. On the average, we have 16.8434 fps for parallel 
OQ, 13.7902 fps for non-parallel OQ, 15.6917 fps for Hillesland’s method, and 
0.95 fps for z-buffer. Namely, we have 18.13% speed-ups of parallel OQ over the 
non-parallel OQ and 7.48% speed-ups of parallel OQ over Hillesland’s method. 
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Fig. 4. The snapshot of the scene used for the overall performance test 



5 Conclusion 

In this paper, we proposed an OC approach based on exploiting multiple paral- 
lelizable HaOQs. The regular grid with overlapping voxel is used to organize the 
spatial data for efficiently updating the actual position of objects in dynamics, 
and the grid is easy to be represented as an octree-like hierarchy for hierarchi- 
cal traversal. By exploiting the eye-siding number of nodes in the hierarchy, we 
can easily traverse the nodes in an occlusion ftb order. Also, nodes with the 
same eye-siding number in a ftb order sequence can be grouped into a parallel 
unit and sent for OQs at a time. As experimental results show, maximizing the 
number of parallelizable OQs for nodes in the hierarchy makes the utilization of 
HaOQ even better, and it leads to a better overall performance. 

Currently, objects in a visible voxel are all sent for rendering. Eventually, 
objects could be sorted into an approximate ftb order for occlusion evaluation 
in a node if the rendering time of objects is much higher than that of performing 
OQs. In the near future, a mechanism that can decide whether or not the OQ is 
worth applying for objects in a visible node is to be explored. The grid resolution 
is empirically selected, i.e., independent upon the number and size of objects in 
this research. In addition, with experiments we have made, the hierarchy traver- 
sal time, the time for performing node OQ, and the actual position update time 
for dynamic objects are relevant to the grid resolution. An automatic grid reso- 
lution determination scheme should be studied for improving or even optimizing 
the overall performance. 
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Abstract. In volume visualization, a high-speed perspective rendering algo- 
rithm that produces accurate images is essential. We propose an efficient ren- 
dering algorithm that provides high-quality images and reduces rendering time 
regardless of viewing condition by using depth-subsampling scheme. It parti- 
tions image plane into several unifonn pixel blocks, then computes minimum 
depth for each pixel block. While conventional ray casting starts ray traversal 
from its corresponding pixel, rays in our method can jump as the amount of the 
minimum z-value calculated. It can also be applied to space-leaping method. 
Experimental results show that our method produces high-quality images as 
those of volume ray casting and takes less time for rendering. 



1 Introduction 

The most important issue in recent volume visualization is to produce high-quality 
perspective -projection images in real time. Real-time volume rendering is highly 
related to hardware-based approaches such as 3D texture mapping hardware [1,2], 
specialized volume rendering hardware [3,4]. Although they achieve 30 fps without 
preprocessing, it is too expensive to be used in common applications and difficult to 
manipulate large volume data due to limitation of dedicated memory size. In addition, 
some of the hardware cannot support perspective rendering. 

Volume ray casting is the most famous software-based rendering algorithm [5]. 
Although it produces high-quality images, it takes long time due to randomness in 
memory reference pattern. In order to speed up, several optimized methods have been 
proposed [7-9], They have mainly concentrated on efficiently skipping over transpar- 
ent or homogeneous regions using coherent data structures such as octrees [10], k - d 
trees [11], and Run-Length Encoded (RLE) volume [6], However these methods 
require long preprocessing time and extra storage to maintain the data structures. 
Template-based method improves rendering speed by using spatial coherence such 
that the sampling pattern on each ray is identical in parallel projection [12]. However, 
it is not suited to perspective projection. 

In this paper, we propose an efficient method that produces high-quality images 
and reduces rendering time regardless of viewing condition by applying depth sub- 
sampling scheme to volume ray casting. Also it does not require preprocessing or 
additional data structures. It partitions image plane into several uniform pixel blocks, 
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and determines the minimum depth for each pixel block. While the original method 
fires a ray from its corresponding pixel in view plane, our method moves forward the 
starting point of ray traversal as the amount of the minimum depth value calculated. 

Our method can also be used for accelerating the space leaping. Space leaping is 
regarded as an efficient rendering method that skips over empty space using distance 
information [7,8]. Much work has been done to devise efficient space leaping. Shar- 
ghi and Ricketts proposed a method that exploits a pre-processed visibility determina- 
tion technique and considers only a potentially visible set of boundary cells [13]. 
Vilanova et al. presented how to identify empty region of a tubular shaped organ as a 
series of cylindrical structures [14]. Although it is possible to accelerate volume ren- 
dering by skipping over those cylindrical regions, it requires some human interven- 
tion for cylindrical approximation. Wan et al. devised a method that exploits distance 
information stored in the potential field of the camera control model [15], After that, 
Wan et al. presented a Z-buffer-assisted space-leaping method, which derives most of 
the pixel depth at the current frame by exploiting the temporal coherence during 
navigation with cell-based reprojection scheme [16]. 

However, the space leaping does not always provide speed-up as we expect. Its 
performance might be declined according to the viewing condition. Our method can 
solve the problem with depth-subsampling. It computes and stores z-depth only for 
some pixels (one per pixel block) rather than all pixels. It does not require complex 
cell reprojection to calculate the distance from the image plane to visible cells. Also it 
can rapidly produce images even when we cannot exploit temporal coherence. 

In Section 2, we present our algorithm in detail. Extension to space leaping is men- 
tioned in Section 3. Experimental results and remarks are shown in Section 4. Lastly, 
we summarize and conclude our work. 



2 Efficient Volume Ray Casting Using Depth-Subsampling 

The main drawback of volume ray casting is to take a long time for rendering due to 
randomness in memory reference pattern. We propose a method to reduce the render- 
ing time by identifying empty space with depth-subsampling scheme and skipping 
over the empty region. Assume that a view plane P has a resolution of NxN. During 
rendering we can compute z-value (depth value), dy for a pixel, py without extra cost. 
The volume between image plane (z = 0) and object surfaces to be displayed (z = dy) 
is defined as exactly empty space (EES). We do not have to consider the value of 
voxels that lie on this volume. However, calculating z-value for all pixels does not 
contribute to reduce the rendering time for the current frame. Instead, it requires more 
storage to keep the z-value for all pixels. 

Our method computes z-values for some pixels rather than all pixels, so we call it 
depth subsampling. In the first step, our method divides a view plane P into several 
uniform pixel blocks G uv as depicted in Figure 1. Given the size of pixel block as SxS, 
rays emanate from its four corner pixels. It performs full range ray traversal and shad- 
ing to determine colors for those pixels. At the same time, it stores z-depth values into 
the depth buffer of which size is ( N / 5j 2 . Since four adjacent pixel blocks share a 
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comer pixel, we have to store only a single depth value for one pixel per pixel block. 
The minimum depth of G m , , d'""' can be determined by comparing four depth values. 




Fig. 1. Depth-subsampling procedure. Our method samples depths of comer pixels for each 
pixel block, determines the minimum depth, and saves them into a subsampled depth buffer 




Fig. 2. A comparison of exactly empty space and potentially empty space for the same region 



Using depth-subsampling, we can identify potentially empty space (PES). Figure 2 
shows the difference of EES and PES for the same region. 

The depth values calculated in the previous step can be used for accelerating ren- 
dering process. In conventional ray casting, a ray emanates from its corresponding 
pixel. On the other hand, in our method, all the rays that belong to a pixel block G m , 
start to traverse at the point apart from their corresponding pixels as the value of d"'"' . 
Figure 3 shows how original ray casting and our method differ in rendering process. 
A black circle stands for a pixel whose color and z-value are already calculated in the 
depth-subsampling step (that is, a corner pixel). A white one means a non-corner 
pixel. A gray circle is a starting point of ray traversal. So we can reduce the rendering 
time as the amount of potentially empty space (hatched region). 
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Fig. 3. A comparison of original ray casting (left) and our method (right) in ray advance 



The most important factor in this method is the pixel block size, S. It directly affects 
rendering speed and image quality, so we have to choose optimal values for S to 
achieve maximum performance without deteriorating image quality. Let the average 
time to compute the color and z-value for each corner pixel be t minz , the time for trav- 
ersing a ray in the original method be t rt , and that for our method be t minrt , then render- 
ing time of original ray casting t M and our method t new can be defined as follows: 



t„u=t n N 

t =t (—) 2 +t (N 2 -(—) 2 ) 

l new ^minzVg/ 1 ^min rt\ iy \ ^ / / 



(l) 



Since the additional time to determine minimum depth value and store it into Z-buffer 
is negligible, t mim is almost the same as t„. We can estimate the average time to reduce 
t save as follows. 



- N , N i 

= t N 2 -t . (-) -t ■ (N -(—) ) 

L rt t minzV ^ ) ^min rt\ ’ (2) 

where t n - t minn is regarded as the average gain for each ray using our method and it is 
denoted as t gain . Assume that t gain is a constant, t save is proportional to the value of S. 
However, as the value of S gets larger, spatial coherence decreases and t gai „ becomes 
smaller as shown in Figure 4. Since ray traversal is performed only in the gray region, 
we can reduce the time as the amount of hatched region (FES’). So, in this case, t gain 
is inversely proportional to the value of S. Consequently, we have to choose the opti- 
mal value of S so as to minimize the rendering time. 
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Fig. 4. Relationship between the pixel block size and the average time to be reduced 



3 Efficient Space-Leaping Using Depth-Subsampling 

Performance of space leaping might be degraded according to viewing specification. 
When the camera is close to object boundary, and the camera orientation is almost 
parallel to surface’s tangent, rendering time gets longer since a ray cannot jump over 
empty space with single leaping. Instead, the ray has to jump several times to access 
the surface boundary as depicted in Figure 5. This is because original space leaping 
relies only on the distance between a voxel and its nearest boundary voxel. We can 
solve the problem of the performance degradation by taking into account image-space 
information (subsampled Z-buffer) as well as object-space information (distance map). 




Fig. 5. An example of inefficient space-leaping. A ray has to jump several times to access the 
object in this viewing situation 



It partitions image plane into several uniform pixel blocks, then determines minimum 
depth of four corner pixels for each pixel block just as the accelerated ray casting. 
While original method start to jump from its corresponding pixel in view plane, our 
method moves forward the starting point of leaping as the amount of the minimum 
depth value. 
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4 Experimental Results 

Virtual endoscopy is a non-invasive diagnosis method based on computer processing 
of 3D image data sets such as CT scans in order to provide visualizations of inner 
structures of the human organ cavity [17-19]. It is a very good example to verify the 
performance enhancement of a perspective volume rendering method since it requires 
real-time generation of high-quality perspective images. 

We compare the rendering time and image quality of original volume ray casting 
( RC ), space leaping (SL) and our algorithm. All of these methods are implemented on 
a PC equipped with a Pentium IV 2.2GHz CPU, 1GB main memory, and NVIDIA 
GeForce4 graphics accelerator. The volume dataset is obtained by scanning a human 
abdomen with a multi-detector CT of which resolution is 512x512x541. 




RC 2 4 8 16 32 

grid size S (pixels) 

—■—512X512 —*—256X256 



Space Leaping 




SL 2 4 8 16 32 

grid size S (pixels) 
—■—512X512 —*—256X256 



Fig. 6. Left graph shows rendering time of original ray casting ( RC) and our method (S=2~32), 
and right graph shows that of space leaping (SL) and our enhanced method 



We measure the rendering time on several different points in colon cavity under 
fixed viewing conditions. In order to estimate the influence of image resolution upon 
rendering speed and image quality, we render the dataset with resolutions of 256 x 256 
and 512x512. Figure 6 (left) shows a comparison of average rendering time of origi- 
nal ray casting and our method. As the value of S increases, rendering time gets 
shorter since t save is proportional to the pixel block size. However when the size of 
pixel block becomes larger than specific value (.S' = 4), the rendering time increases 
slightly due to the lack of spatial coherence as mentioned in Section 2. So we have to 
determine the optimal value of S according to volume data and final image size. For 
example, when we set the size of pixel block as 4 and image size as 512x512, render- 
ing time of our method is only about 39% of that of original ray casting. Figure 6 
(right) shows the comparison of average rendering time for conventional space leap- 
ing (SL) and our method. As the value of S increases, rendering speed increases until 
it reaches specific value. When we set the value of S as 4, rendering time of our 
method is about 47% of that of original space leaping. So we can conclude that our 
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method can double up the rendering speed in comparison to conventional rendering 
methods. 

Figure 7 shows the quality of images produced by original ray casting method and 
ours as the pixel block size increased from 4 to 16 under fixed viewing condition. It is 
very hard to recognize the difference between images from the two methods. There- 
fore we conclude that our method renders volumetric scene without loss of image 
quality. 
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Fig. 7. A comparison of image quality of virtual colonoscopy in several different regions: 
leftmost column shows images produced by original ray casting and the remaining ones depict 
images produced by our method when S = 4, S = 8, and 5=16 from left to right 



5 Conclusion 

The most important issue in volume rendering for medical image viewing is to pro- 
duce high quality images in real time. We propose an efficient ray casting and space- 
leaping method that reduces the rendering time in comparison to the conventional 
algorithms in any situation without loss of image quality. Using depth-subsampling 
scheme, our method moves forward the starting point of ray traversal as the amount 
of the minimum depth value calculated in subsampling step. This method reduces 
rendering time of space leaping even when a camera is close to object surface and its 
direction is most parallel to tangent vector of the surface. It can be applied to generate 
endoscopic images for any kind of tubular-shaped organs. Experimental results show 
that it normally produces high-quality images as in ray casting and takes less time for 
rendering. 
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Abstract. In virtual endoscopy, fast and accurate path generation and naviga- 
tion is the most important feature. Previous methods made a center line of or- 
gan cavity by connecting consecutive points maximally distant from organ wall. 
However they have some problems to demand a lot of computation and extra 
storage for spatial data structures. We propose an efficient path computation al- 
gorithm. It determines camera direction for the next frame identifying a ray that 
has maximum depth in current frame. Camera position is determined by using 
the center of gravity of organ’s cross-section. Entire camera path can be con- 
structed by applying the operation in every frame. It doesn't require preprocess- 
ing and extra storage since it depends only on image-space information gener- 
ated in rendering time. 



1 Introduction 

Optical endoscopy is a less-invasive diagnosis method. We can directly examine the 
pathologies of internal organs by putting endoscopy camera into human body. It of- 
fers higher quality images than any other medical imaging methods. However it has 
some disadvantages of causing patients discomfort, limited range of exploration and 
serious side effect such as perforation, infection and hemorrhage. 

Virtual endoscopy provides visualizations of inner structure of the human organ, 
which has pipe-like shape such as colon, bronchus and blood vessel [1], Since virtual 
endoscopy is non-invasive examination, there is no discomfort and side effects. In 
order to implement virtual endoscopy, we should devise a method that produces high 
quality perspective images within short time. However, it is more important to avoid 
collision between virtual camera and organ wall, and let the camera smoothly move 
through the human cavity. Therefore, fast and accurate path generation is essential for 
efficient diagnosis using virtual endoscopy. Previous methods mainly depend on 
object-space information such as distance map and potential field. Since entire vol- 
ume data should to be considered to produce those spatial data structures, a lot of 
computation in preprocessing step and extra storages are required. 

In this paper, we propose an efficient camera path generation and navigation algo- 
rithm that computes camera position and orientation for the next frame while render- 
ing a scene in current frame. It determines camera orientation for the next frame using 
a ray that has maximum distance in current frame, and it calculates camera position in 
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the next frame using center of gravity of organ region on cross-sectional image. Since 
this process can be done without extra cost during ray casting, it computes accurate 
camera path in real-time. 

Related work is summarized in the next section. In Section 3, we present our 
method in detail. Experimental results and remarks are shown in Section 4. Lastly, we 
summarize and conclude our work. 



2 Related Works 

Volume ray casting is the most famous rendering method [2], After firing a ray from 
each pixel on a view plane, it computes color and opacity on sample points along the 
ray and determines final color by blending those sample values. Although it takes a 
long time to make an image due to randomness in memory reference pattern, it can 
produce high-quality images in comparison to the other methods and provide perspec- 
tive rendering for generating endoscopic scene. 

In order to implement realistic virtual environment, several kinds of interactions 
between objects should be considered. Especially, collision avoidance and fly- 
through along center line are more important than any other components in virtual 
endoscopy. Some spatial data structures can be used for collision avoidance. Occu- 
pancy map has the same resolution as the volume dataset and each cell of the map 
stores identifiers for objects that occupy the cell [3]. As an object changes its position 
or size, the value of each cell should be updated. If a cell has two or more identifiers, 
it is regarded as collision of those objects. Since the method should update cell data 
whenever an object changes its position and size it is not adequate for real-time appli- 
cation. A distance map [4] is a 3D spatial data structure that has the same resolution 
of its volume data and each point of the map has the distance to the nearest boundary 
voxel. The smaller the distance values the larger the collision probability. However 
this method requires long preprocessing time for 3D distance transformation. 

Navigation method for virtual endoscopy can be classified into three categories; 
manual navigation [5], planned navigation [6] and guided navigation [7,8]. In manual 
navigation, we can directly control the virtual camera to observe anywhere we want 
to examine. However the user might feel discomfort since camera movement is en- 
tirely dependent on user’s control, and collision may occur if the user misleads the 
camera. Planned navigation calculates entire path in preprocessing time using several 
path generation algorithms, then moves the camera along the pre-calculated path. It 
can fly through the desired area without user intervention. It requires a lot of compu- 
tation in preprocessing step and it doesn’t allow users to control the camera intui- 
tively. Guided navigation is a physically-based method, which makes a spatial data 
structure such as potential field in preprocessing step and determines camera orienta- 
tion by considering attractive force that directs to target point, repulsive force from 
organ surface, and user’s input. It guarantees that a camera arrives to target point and 
move the camera to anywhere the user want to place without collision against organ 
wall. However it is hard to implement and it demands a lot of cost to make and main- 
tain the potential field. 
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3 Navigation Method Using Image-Space Information 

Guided navigation methods mentioned in the previous section are regarded as object- 
space approach as it exploits overall information of entire volume dataset while 
building spatial data structures. Even though it guarantees to generate reliable naviga- 
tion path, it takes a lot of preprocessing time. In virtual endoscopy, in general, infor- 
mation for a small part of entire volume is required since field of view is restricted in 
cavities of a volume dataset. Our method computes camera position and orientation 
considering only some voxels in view frustum, thus we call it image-space approach. 

A navigation path can be defined as a pair of functions, P= { d(t),cp(t )} , where 
S(t) returns camera direction at a specific frame t, and <p(i) returns camera position. 
Our method determines camera orientation for the next frame t i+ \ using depth infor- 
mation obtained in rendering step of current frame t, and camera position for the next 
frame by using the center of gravity in cross-section of organ cavity. Figure 1 shows 
the procedure for determining entire navigation path of our method. Rendering pipe- 
line is the same as in the conventional volume ray casting method. Our method has 
another pipeline to specify the center-line of organ cavity. 
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rendering pipeline 

► collision avoidance, center line detection pipeline 

Fig. 1 . A procedure for determining navigation path 



Figure 2 shows a procedure to the determine camera position and orientation in the 
next frame in detail. Volume ray casting fires a ray from each pixel on the view plane, 
then it computes color and opacity on sample points along the ray and determines 
final color by blending them. During traversing a ray we can compute the distance to 
the nearest non-transparent voxel multiplying the number of samples by unit distance 
between two consecutive samples. The distance can be regarded as the depth value to 
organ wall. Let dj be the depth value of a pixel py, we find the direction of a ray 
fired from a pixel that has the maximum depth value as follows : 
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Where M and N are horizontal and vertical resolution of an image and /is a mapping 
function of a depth value for a pixel to its corresponding ray vector Ry. We set the 
direction of a ray fired from a pixel that has the maximum depth value d max as a form 
of unit vector, which is denoted as R mm (see Figure 2 (top)). 

Depth values reflect the possibility of collision in organ cavity. Larger depth value 
means that obstacles are located far away from the current camera position. On the 
contrary, smaller the depth value, higher the collision possibility since the camera is 
close to the organ wall. VPN t is a unit vector that represents a view plane normal for 
current frame t,. We set the view plane normal for the next frame VPN i+ \ as R,„ av to 
minimize the probability of collision. Then move the view plane as the amount of unit 
distance ( A ) toward the direction as depicted in Figure 2 (middle). 

After determining viewing direction of the next frame, we have to calculate new 
camera position, COP i+ \ as shown in Figure 2(bottom). That is the center of gravity 
on organ cavity, which is composed of connected pixels has minimum intensity in 
view plane. In the following equation COG(x,y ) calculates the center of gravity on the 
next frame. 



N-l 1 N-l 'N 

COG — Y.X,,— 2>,- • 



(2) 



Where, A is a number of pixels corresponding organ cavity on projected area and x h 
yi are coordinates for each pixel. We can calculate the center of gravity point on pro- 
jection plane by averaging of all pixels on organ cavity area. 



4 Experimental Results 

In order to check whether our method generates a camera path along center line and 
provides reliable navigation through organ cavity, we compare navigation paths made 
by object-space method using 3D distance field and our image-space method. These 
methods are implemented on a PC equipped with Pentium IV 2.6GHz CPU, 1GB 
main memory, and ATi RADEON 8500 graphics accelerator. The volume data used 
for the experiment is a clinical volume obtained by scanning a human abdomen with a 
multi detector CT of which the resolution 512x512x541. Since human colon has 
complex pipe-like structure, virtual colonoscopy is a good example to apply our 
method. 

Figure 3 shows how different the paths generated by object-space method using 3D 
distance transformation (denoted as DIST) [9] and our method (denoted as OURS). 
Since DIST is known as an accurate center-line computation method, difference be- 
tween those paths can reflect the accuracy of our method. 
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Fig. 2. A procedure to the determine camera position and orientation in the next frame, (top) 
calculates a maximum depth value and set R,„ av as the ray that has the maximum depth on 
current frame, (middle) moves the view plane foward as the amount of unit distance ( A ) along 
R„, m and set VPN i+l as R max . (bottom) compute center of gravity and set it as COP i+l for the next 
frame 
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Fig. 3. A comparison of navigation paths produced by DIST (left) and OURS (right) 




Fig. 4. Test region in colon. A straight region (A) and a curved region (B) 

In case of DIST, distance based path generation method takes 260 sec in preprocess- 
ing step. Our image-space based method need no preprocessing. 

Several regions are carefully selected for accuracy test. As shown in Figure 4, we 
made navigation paths on a straight region and a curved region using two methods. 
Our method is adequate to any pipe-like structure without branch. 

Two graphs in Figure 5 show the difference between center of gravity (OURS) and 
maximum distance point (DIST) on region A and region B in the colon. Average 
radius of colon cavity in cross-sectional image is about 20 to 30 voxels. Experimental 
results show that center of gravity points are very close to the center-line of organ 
cavity generated by distance-based method. The average of distance between a center 
of gravity and colon center is 1.7 voxels in straight region and 1.9 voxels in curved 
region. This means that our method guarantees to generate a camera path on center- 
line of organ cavity. 



Table 1. The average and the standard deviation of distance (voxels) 
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Region A 


Region B 


Average 


1.7 


1.9 


Standard deviation 


0.8 


1.5 




Fig. 5. A distance between center of gravity (in OURS) and a position maximally distant from 
organ wall (in DIST) is measured in consecutive frames on a straight region (top) and a curved 
region in colon (bottom). A path for DIST is represented as yellow line, and that for OURS is 
depicted as red line in cross-section images on right column 

Figure 6 shows consecutive frames for region A while applying our navigation 
method. It can allow users to control the virtual camera conveniently and examine 
wider area in colon since the camera lies on center point of colon cavity. 




Fig. 6. Virtual navigation in human colon with our method 
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5 Conclusion 

We proposed an efficient path generation method using depth values and the center of 
gravity in cross-section. We can get camera direction of the next frame using the ray 
that has maximum distance in current frame. And the camera position is calculated 
using the center of gravity in cross-section. Our method can generate a reliable cam- 
era path and it doesn't require preprocessing stage and extra storage for maintaining 
spatial data structures. Consequently it helps doctors to perform fast and accurate 
diagnosis. 
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Abstract. Hair motion simulation in computer graphics has been an attraction 
for many researchers. The application we have developed has been inspired by 
the related previous work as well as our own efforts in finding useful algo- 
rithms to handle this problem. The work we present uses a set of representa- 
tions, including hair strands, clusters and strips, that are derived from the same 
underlying base skeleton, where this skeleton is animated by physical, i.e. 
spring, forces. 



1 Introduction 

One of the most exciting and challenging research areas in CG is the hair motion 
simulation. The challenge is due to the fact that a human head has up to 160,000 
individual hair strands and simulating each of them in a straightforward manner is 
quite costly and tedious. Therefore, one should come up with efficient algorithms. 

One idea is using level of detail (LOD) approach, a popular method in CG model- 
ing. The hair can be represented as segments where each segment might be either a 
hair strand, a cluster or a strip. Depending on the distance between the viewer and 
animated head, the corresponding level switches will be performed. 

In addition to the LOD usage, there is also another useful idea: The structure be- 
hind the modeling of these segments. All the segments are derived from the same 
base hair skeleton structure and the dynamics of this structure is based on the applied 
physical forces, such as spring, gravitation, friction, absorption and repulsion force. 

This paper aims to present the work we have carried out to add these features to 
our hair motion simulation application. We should also produce a human head which 
is ready to host the hair. Thus, this problem can be described as the modeling, dynam- 
ics and rendering of human hair and head in real time. 

2 Solution and Design Decisions 

The solution to this problem consists of two parts, namely, modeling, rendering and 
animating human head, and modeling, rendering and animating human hair. While 
deciding on the design issues, a human head model for hair placement is prepared, 
making use of an auxiliary software package called 3DS Max [1]. Then, use of the 
physical laws and the base skeleton structure for hair is prepared. Finally, the hair 
part is implanted on the scalp of the head part. 
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2.1 Human Head 

The procedure to obtain a human head requires modeling and rendering it. Making 
use of an existing head model in 3DS Max software environment, the head is loaded 
into the software and some transformations and a crucial operation, scalp mesh sepa- 
ration are added to it. Having seperated the scalp mesh and added some basic trans- 
formations, i.e. rotation and translation to the head, the hair can be implanted and 
moved. 

Tasks of Scalp Mesh Triangles. Each scalp mesh triangle is associated with the 
following tasks: 

1 . Determining the implanting point of the hair segment (rooting) 

2. Determining the density of hair (density) 

3. Detecting collision between scalp and hair segment (collision) 

Rooting'. The root of each hair segment will be the center of gravity (COG) of the 
corresponding scalp mesh triangle [2]. Having obtained the scalp mesh triangles, new 
hair segments originating from the COG of these triangles are generated. Since each 
hair segment will be long enough, the segments originating from a higher triangle, 
will also cover regions of some lower triangles. As the number of segments increases, 
some of the lower level triangles will be automatically hidden by the long hairs origi- 
nating from above. Hence, for the sake of performing less computation in real-time, 
the rooting of hair from lower triangles is sometimes omitted. 

Density : If the number of hair segments is equal to the number of scalp mesh trian- 
gles, rooted hair might be sparce. In order to handle this problem, a given triangle is 
recursively subdivided and new hair segments are originated from the COG of the 
newly created ones [2], as demonstrated in Figure 1. 




Fig. 1. Increasing number of roots via subdivision process 

It should be noted that when the type of the triangle is close to an equal-sided tri- 
angle (well-proportioned), the rooting points are fairly and logically distributed over 
the shape. However, when the triangle sides are badly proportioned then the rooting 
points will stack on one region of the shape dominantly. The latter case is undesirable 
and therefore while generating the scalp mesh triangles, it should be tried to create as 
proportioned triangles as possible. 

Collision'. The last task of the scalp mesh triangles is the collision detection between 
themselves and the corresponding hair segment. To do this the normal -to decide 
collision- and some physical constants (i.e. absorption, friction, repulsion) -to decide 
the response forces in case of a collision- of each triangle are necessary. Since each 
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hair segment is recognized by their masses 1 , the location of all masses is compared 
with a single scalp triangle. In case of a collision between a given triangle and a hair 
segment mass point, the following collision response operations are performed: 

1 . Copy and omit the velocity component of the mass in each axis, set v x v y v z to 0, in 
order to prevent it from going further towards to the inside of head. 

2. Apply the friction force, using the physical friction constant of triangle material, to 
the mass. The force will be proportional to the magnitude of velocity (which is 
copied before setting initial velocity to 0) of the collided mass and be applied in 
the direction of the velocity vector of the mass to reduce repulsion effect. 

3. Apply the absorption force, using the physical absorption constant of triangle ma- 
terial, to the mass. The force will be proportional to the magnitude of velocity of 
the collided mass and be applied in the direction of the velocity vector of the mass 
to reduce repulsion effect. 

4. Apply the repulsion force, using the physical repulsion constant of triangle mate- 
rial, to the mass. The force will be proportional to the magnitude of velocity of the 
collided mass and be applied in the opposite direction of the velocity vector of the 
collided mass to make mass go away from triangle. 

Considering that there are roughly 80 mass points for one hair segment and each of 
these points must be compared with candidate triangles for collision detection proc- 
ess, the cost of computation becomes expensive if the number of triangles in scalp 
increases. Therefore, some sparseness in the scalp is left and those sections are treated 
as rough bounding rectangles. 

2.2 Human Hair 

There are three main aspects in human hair simulation which are stated as hair shape 
modeling, hair dynamics (animation) and hair rendering in [3]. 

Hair modeling involves the exact or fake creation of individual hairs. Since there 
are approximately 160.000 individual hair strands on a given human scalp, the model- 
ing should be performed in such a way that the resulting animation of the selected 
model will be efficient and fast. 

Hair dynamics involves the animation of the hair. Several forces are applied to the 
particular parts of the hair in order to accomplish the animation. 

Hair rendering is involved with the appearance of simulated hair from the viewer’s 
point of view. Several rendering techniques are applied to the hair for realism. 

Hair Modeling. There are three hair modeling schemes dominant in this research 
area: Strand Hair Model (SHM) introduced in [4, 5]; Multiresolution Hair Model 
(MHM) introduced in [6]; Modeling with Level-of-Detail Representations (LOD) 
introduced in [7-9]. 

Strand Hair Model (SHM): SHM is the simplest way to model the hair. Each indi- 
vidual hair strand is explicitly designed by this scheme [5]. The details can be seen in 
Figure 2. 



1 Masses are the nodes of the base skeleton structure. 
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Fig. 2. The process of modeling an individual hair strand in SHM 

After applying the related transformations, namely tapering and bending respectively 
to the initial cylinder object, an individual hair strand is obtained. 

Multiresolution Hair Model (MHM)\ MHM is an admissible model to represent hair 
since it is based on the multiresolution concept whose major benefit is the user’s 
freedom to choose the appropriate level of detail for a desired model manipulation. 
Multiresolution manipulations for hair modeling are achieved with a hierarchy of a 
set of generalized cylinders. Hair design is created with a small set of clusters, 
roughly 30 clusters per human head. Subdividing these clusters yields the submodel 
of MHM, which is Cluster Hair Model (CHM). Putting all together, MHM combines 
the benefits of CHM and SHM by allowing local control as well as global control. 
MHM is too complex to model since it allows for interactive hair modeling. Consid- 
ering that our project does not deal with interactively creating complex hair models, 
we have concentrated on the following method instead of using the MHM. 

Modeling with LOD Representations : Getting the basic idea from [7], we introduced 
some new algorithms. The preferred approach, LOD, uses three novel representations 
based on a base skeleton to create levels-of-detail (LODs) for hair. These are strips, 
clusters and individual strands: 

Strips : Strip is the coarsest (lowest) LOD used for hair modeling. It is typically used 
to represent the inner most layers of hair and it is responsible for the global physical 
behavior and the volume of the hair during the simulation. The transition (switching) 
to this LOD representation is performed when the viewing distance to the human 
head is increased. 

In order to model the strip, the base skeleton structure is used. Starting from the top 
skeleton node, it is proceeded towards to the bottom by putting 2 control points for 
each of the node point encountered. It is important to notice that, the 2 control points 
and the related node point are collinear. Obtaining the global surface by this way, we 
subdivide it to obtain a smooth representation. The subdivision surface that is ob- 
tained for one strip is shown in Figure 3. 
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Fig. 3. Strip represented as a subdivision surface 
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Fig. 4. Process of obtaining a cluster 
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Clusters : Cluster is used to represent a group of hair. This LOD improves perform- 
ance since instead of animating lots of hair strands close to each other; we will ani- 
mate just one object that represents the total behavior. 

In addition to the use of base skeleton structure, the generalized cylinder concept 
arises in order to model a cluster. The generalized cylinder is obtained as follows: 
Initially three identical base skeletons are created; one on the left, one on the right 
and the other on the back (Figure 4). The corresponding nodes of each skeleton are 
joined, forming a surface to which our texture can be easily mapped. The texture 
gives the illusion of a group of strands, roughly 30 strands. Hence, by applying forces 
to just 3 skeletons, a group motion consisting of 30 strands is obtained. It will gain a 
volume to hair and reduce the sparseness with the cost of animating just three skele- 
tons. 

Individual Strands'. An individual strand is used to model the hair strand and can be 
seen only when the viewer is very close to the head. Switching to this LOD will take 
place when distance between the viewer and the head is sufficiently decreased. 

As in all other LODs, individual strands are based on the base skeleton. Actually, 
the base skeleton itself is directly the hair strand. By combining two consequent 
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masses of the skeleton via a line segment, the hair strand is automatically obtained 
(Figure 5). 




Fig. 5. Process of obtaining an individual strand 

Hair Dynamics. Hair dynamics (animation) is closely related to the base skeleton 
structure. The base skeleton is defined as the structure consisting of point-like masses 
connected to each other via line segments (Figure 5). The hair dynamics is performed 
by applying forces to each mass of the base skeleton structure, where the term mass 
seems logical since the Newton forces (F = ma) is applied to those masses. The 
applied forces will change the position of each mass accordingly. Changing the 
positions of masses will change the position of the base skeleton and therefore the 
position of the active LOD representation derived from the moved base skeleton. 



Moving Base Skeleton. Moving the base skeleton corresponds to moving the derived 
LOD representation as well. Thus, the understanding of the idea of moving this skele- 
ton resolves the dynamics involved. 

The source of a movement is the applied forces. The net force applied on a mass, F, 
changes its acceleration, by F = ma where m is predefined. By new acceleration, the 
new velocity is obtained and by new velocity the displacement that should be added 
to the current position for that particular mass is obtained. 

There are two types of forces (main forces) that are always applied regardless of 
the animation state. Spring forces'. Two consequent masses introduce to each other 
spring forces. Gravitation forces: The traditional gravitational force is applied to each 
mass. 

Besides, there are four more forces (extra forces) that are applied when necessary: 

1. Air Friction: As long as a mass is moving (velocity is not 0), an air friction force is 
applied in the opposite direction of velocity vector and with the magnitude propor- 
tional to velocity of the mass. 

2. Scalp Friction: As long as a mass is colliding to a triangle, or a bounding rectangle 
of scalp mesh, a friction force is applied in the direction of velocity vector and 
with the magnitude proportional to velocity of the mass. 

3. Scalp Absorption: As long as a mass is colliding to a triangle, or a bounding rec- 
tangle of scalp mesh, an absorption force is applied in the direction of velocity 
vector and with the magnitude proportional to velocity of the mass. 
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4. Scalp Repulsion: As long as a mass is colliding to a triangle, or a bounding rectan- 
gle of scalp mesh, a repulsion force is applied in the opposite direction of velocity 
vector and with the magnitude proportional to velocity of the mass. 

The visualization of the forces mentioned above is given in Figure 6. 
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Fig. 6. Visualized representation of forces 

In each frame of the animation, the current total force acting on corresponding 
mass is simulated by using the Newton law, F = ma. Since F is known, by the result 
of Figure 6, and m, the predefined constant is known, acceleration, a, is computed. 
Knowing the acceleration, the amount of change in velocity is computed by using v = 
ad L where a is the acceleration just found and d t is the predefined change in time. 
Adding this new v, the velocity gained/lost, to current v, new velocity is obtained. 

Knowing the new velocity, the amount of change in displacement is computed by 
using equation x = vd t , where v is the velocity just found and d, is the change in time. 
Adding this new x, the road taken, to current x, the desired new position is obtained. 

The scalp friction-absorption-repulsion forces need to detect collision. Another 
collision detection is necessary between the hair segments themselves. All of these 
collisions will change the net force when they are detected and the changing net force 
will obviously change the dynamics (movement) of the hair. 

Collision Detection. There are two types of collisions to be detected. When they are 
detected, the specified response forces of Figure 6 are applied to the collided mass. 

In scalp collision, it is required to detect collision between a scalp mesh triangle 
and a mass of the hair segment. This is done in two steps. First, the mass must reside 
on the “infinite” plane of the candidate triangle. If not, then there is no need to apply 
the second step since the mass cannot hit the triangle. However, if the mass is on the 
plane then the second step takes place to decide whether the mass is inside the 
boundaries of the candidate triangle. To perform this inside-the-boundary test, we 
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create three planes for each edge of the candidate triangle with their normals all 
pointing in. The mass is decided to hit the triangle if and only if it is inside all of the 
three planes. Applying these two steps to all of the mesh triangles reduces the real- 
time running performance significantly. Therefore, binary’ search approach is used to 
find the candidate triangle. This is done by a bounding box approach where the far 
left triangles are ignored if the mass is on the right side with respect to the center of 
gravity of current region of head and vice versa (Figure 7). 




a) As a result of binary search b) As a result of binary search 

iteration 1, eliminate right, region iteration2, eliminate specified region 



Fig. 7. Eliminating irrelevant triangles before deciding the one being hit by the mass 

In hair collision, the simplest algorithm is the detection of collision of every mass 
of each hair strand with all the masses of the remaining hair segments exhaustively. 
This is done by comparing the nearness of the two mass coordinates; i.e. by tolerating 
some amount of error. This tolerance is necessary since it is almost impossible to 
match two coordinates exactly when many forces are acting on each mass in each 
refresh cycle. 

This approach is computationally expensive. For only one mass of N masses of the 
given segment, all N masses of the other hair strands are tested, leading to the cost of 
0(N 2 ) for just one hair collision. If we assume that there are N such segments on the 
scalp, this cost increases to 0(N 3 ). In short, we will be forced to apply this 0(N 3 ) 
algorithm in each frame, which is done in periods of 0.33 seconds for 15 strands with 
80 masses each. The bounding boxe idea can help. Instead of testing the given mass 
with all of the N masses of the candidate hair segment all the time, we put that candi- 
date segment into a bounding box and make a test only once, just for the intersection 
between mass and box. Only if this initial test is passed, i.e. intersection is detected, 
then we try the test of one mass with all others. The boxing of a given hair segment 
will be done using the min/max of axis values as Figure 8 illustrates. 
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Fig. 8. Bounding box of a given hair segment 




Fig. 9. Rendered versions of cluster representation 



Hair Rendering. According to the active LOD, several techniques of rendering are 
applied. When the viewing distance is very large, we switch to strip representation. 
Since strip is a subdivision surface, it has well-defined surface normals. Hence, bump 
texture maps are easily applied. Clusters are also bump-mapped since they have 
similar properties (Figure 9). Strands are just lines, thus they are drawn with a single 
shaded color. 

3 Results and Future Work 

Hair motion simulator is tested on a PC having 512MB of RAM and 2.4GHz proces- 
sor speed. In order to evaluate the program several tests, whose results are given in 
Table 1, have been applied. (SC: Scalp to hair collision; HC: hair to hair collision) 
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Table 1. Frame process time measured in seconds 





no collision 


SC 


HC 


SC + HC 


1 Hair Strand 


0.04 


0.04 


n/a 


0.04 


100 Hair Strands 


0.08 


0.27 


0.88 


>2.0 


1 Hair Cluster 


0.05 


0.05 


n/a 


0.05 


100 Hair Clusters 


0.09 


0.44 


0.94 


>3.0 



The model used for the table consists of 17064 vertices for the head mesh that make 
up 5688 head mesh triangles, 312 scalp mesh vertices for 104 scalp mesh triangles 
and 80 masses for each hair strand. Real time animation for a denser hair placement 
than what is given in Table 1 is not achieved yet and we are improving some of the 
techniques used so that a more realistic head (Figure 9) can be animated close to real 
time. 
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Abstract. This paper presents two kinds of novel haptic mouse systems as new 
human computer interfaces, which have a force and tactile feedback capability. 
The first one can reflect 1 dof grabbing force as well as 2 dof translation force. 
Five-bar mechanism has been adapted to realize the 2 dof translation force 
feedback, and double prismatic joint mechanism has been used to implement 
the grabbing force feedback. This system helps the user to feel grabbing force, 
contact force and weight while picking up and moving an object in virtual envi- 
ronment. The second system can simulate the surface roughness as well as con- 
tact force. This system consists of two parts: a 2 DOF force feedback device for 
kinesthetic display and a tactile feedback unit for displaying the normal stimu- 
lation to the skin and the skin stretch. The proposed systems are expected to be 
used as new human computer interfaces by presenting realistic haptic interac- 
tion in e-commerce or VR environment. 



1 Introduction 

Since the advent of the first modem computer “ENIAC” in 1946, many types of com- 
puter interface have been developed to facilitate the use of computer. Punch card and 
high-speed magnetic tape had been developed in an early stage of computer interface 
history. Mouse and touch screen were developed in 1963 and 1970 respectively. Most 
recently, touch pad was developed in 1988. However, the computer interfaces devel- 
oped so far have been used unilaterally only as input devices to transfer information 
to computer, and could not play an interactive role. In addition as the demand for 
more realistic communication method with the computer is growing, haptics has 
emerged as the new element in the computer interfaces. As stated in an article of 
MIT’s Technology Review, the haptic interface will become “an expected part of 
interfaces of all kinds of computing devices.” Especially in the area of telepresence 
and virtual presence, the demand for communication with haptic information is ever 
growing and the forthcoming of haptic interface as the third interface is self-evident 
beyond the visual and audio interfaces [19]. 
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So far, many types of haptic interfaces with the computer have been developed. 
PHANToM™ of SensAble Technologies [17], and CyberForce™ of Immersion Cor- 
poration [11] are typical examples of commercial haptic devices. However, these 
haptic devices are inadequate as a computer interfaces since it gives fatigue when 
people use them for a long time. Among the possible types of interfaces, mouse-type 
interfaces have received much attention. Since people have been used the mouse as a 
main computer interface for a long time, adding haptic functionality to the mouse will 
be the natural extension of an existing computer interface. A number of research 
works have proposed several mouse-type haptic interfaces with the computer. Since 
the operating system of a computer has changed to Windows-based environment, 
research on mouse-type haptic interface, here we define it as haptic mouse, have been 
focused on delivering interactions with the GUI to human in the form of haptic sense. 
Akamatsu developed multi-modal mouse that presents tactile sense via a small pin 
driven by solenoid on the index finger and the kinesthetic sense via an electromagnet 
on the bottom of the mouse [1,2]. Through the button clicking experiments with the 
help of haptic cue, he demonstrated the usefulness of haptic information in computer 
interfaces. Kelly and Salcudean developed MagicMouse system, which receives posi- 
tion input through the PSD sensor and feedback 2 dof forces through the electromag- 
netic voice coil actuator in the X Windows environment. With the MagicMouse they 
made a haptic modeling of elements such as button, menu bar, and icons in Windows 
and asserted that inclusion of force feedback increases the efficiency of work with the 
computer [12]. In addition to mentioned devices, many other types of interfaces to 
implement the grabbing force feedback exist [4,5]. 

The objective of this paper is to present two kinds of new haptic mouse systems 
which are able to convey the virtual environment to human more realistically with the 
grabbing force feedback and tactile feedback respectively. In Section 2, we explain a 
haptic mouse which provides 2dof translational forces and grabbing force feedback. 
Overall structure, design specification and evaluation of the system are described in 
this section. In Section 3, we suggest an integrated tactile display mouse that provides 
kinesthetic force, pressure distribution, vibration and skin stretch. It helps user to feel 
the shape of virtual object and the roughness of contact surface simultaneously. In 
Section 4, we will briefly mention the proposed haptic mouse’s application to the e- 
commerce and 3d CAD (Computer Aided Design) system. 




Fig. 1. A motor is embedded into mouse body which is laid on the holed plate. A special 
mechanism is controlled by other two motors and mouse body is connected to the mechanism 
under the plate 
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2 Grabbing Force Feedback Mouse System 

A new haptic mouse system is designed by adding force feedback to the conventional 
mouse. Virtual environment, for example, icons and menu bar of the Windows sys- 
tem as well as graphically rendered object can be felt through the touch senses. Figure 
1 shows the overall configuration of the haptic mouse system. 



2.1 Overall Structure 





Fig. 2. Two motors and five bar mechanism provide 2 dof translational forces 



Hardware of the proposed haptic mouse system can be divided into two parts. The 
first part is to provide 2 dof translation force feedback. Five bar mechanism is 
adapted for this purpose with the ground link length to be zero, as shown in Fig. 2. 
The workspace of the five bar linkage is mapped one-to-one to each pixel of the 
computer screen. Wire-driven mechanism is used to provide back-drivability as well 
as to eradicate problems aroused from backlash. 

The second part is to provide grabbing force feedback, as shown in Fig. 3. Pulley 
is attached to the axis of the DC motor, which provides actuating force to the bar 
tangent on the pulley. Users feel the grabbing force by pressing the finger pad on 
each sides of the mouse. Wire-driven mechanism is also used and linear motion of the 
finger pads are guided by the ball bush bearings. 




Fig. 3. This device adapts wire-driven mechanism for the actuation. Two prismatic joint are 
tangentially attached to the pulley with wire. Two prismatic joint moves inward and outward 
together, so that it transmits one dof grabbing force generated from the motor 
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2.2 Design Specifications 

To find out the design specification for the new haptic mouse, we performed three 
experiments. In the first experiment, we measured the force and stiffness combination 
to give a hard wall feeling to the user’s wrist that holds the haptic mouse with preci- 
sion grasping. To provide a similar environment as the haptic mouse, mouse is at- 
tached to the stylus of PHANToM™. Precision grasping is assumed for mouse ma- 
nipulation. 8 subjects participated in the experiments. Each subject felt the virtual 
wall with a mouse attached to PHANToM™ by tapping the virtual wall. Virtual wall 
is modeled with spring where its stiffness varies from l.ON/mm to 2.4N/mm at 0.2 
N/mm intervals. Maximum force output of PHANToM™ varies from IN to 5N at IN 
intervals. Each subject felt the 40 combinations of the virtual wall, and the subjects 
were required to answer whether they felt the hard wall or not. This test was repeated 
two times, and only subject’s answers that showed consistency in both times were 
recorded. From this experiment, we concluded that five bar mechanism should be 
able to present at least 4N in all direction and the stiffness of a virtual wall should be 
at least 2.4N/mm to be perceived as a hard wall by the user’s wrist. 

In the second experiment, we measured the force and stiffness combination to 
give a hard object feeling to the thumb and ring finger that are used to grab the virtual 
object. Likewise, we concluded that grabbing force feedback mechanism should be 
able to present at least 3.5N at each finger and the stiffness of a virtual object should 
be a least 2.2N/mm to be perceived as a hard object by the fingers. 




a) 




b) 



Fig. 4. a) Subjects are required to click buttons on the edge and button on the center by turns 
until they click all the buttons on the edge, b) The figure shows the measurement result of a 
subject 



In the third experiment, we measured the average mouse workspace required to 
manipulate it under the condition that subject uses mouse the most comfortably. The 
workspace needed to complete a given task was measured with magnetic position 
tracker (Flock of Bird-FOB). The task to be completed is designed to measure the 
motion range of the point in the mouse, to which the end-effector of the five-bar will 
be attached. As shown in Fig. 4(a), buttons are placed along the edge of the monitor 
to obtain the maximum range of motion of the mouse manipulation. Subjects are 
required to click buttons on the edge and button on the center by turns until they click 
all the buttons on the edge. Before performing the task, subjects were allowed to 
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adjust the resolution of the mouse until they feel the mouse is comfortable to use. 10 
subjects are participated in the experiment and Fig. 4(b) shows the measurement 
result of a subject. Except the two extreme cases, the results of the 8 subjects are 
averaged. As a result, 32mm by 22mm rectangular area is obtained, which is used as 
the workspace of the mouse bounded by the hole of the plate under the mouse. 



2.3 Comparison of Point Force with Grabbing Force 





Fig. 5. One of the five different shapes of objects which are triangle, square, diamond, circle 
and ellipse is randomly presented and the subjects are expected to tell the shape of an object by 
point force feedback and grabbing force added feedback respectively 



We performed an experiment to evaluate the usefulness of the grabbing force feed- 
back. As shown in Fig. 5, a circular area indicated by an arrow is located in the com- 
puter screen. In this area, one of the five different shapes of objects which are triangle, 
square, diamond, circle and ellipse is randomly presented and the subjects are ex- 
pected to tell the shape of an object. The visual information is shut off and only haptic 
information is provided. The subjects performed the experiment in two modes, one 
with point force interaction with only five bar force feedback mechanism to mouse 
body and the other with grabbing force interaction with both of grabbing force feed- 
back mechanism and five bar mechanism. With the stated experimental condition 
above we measured the time taken for the subject to answer the shapes of objects 
correctly. 




Fig. 6. Grabbing force feedback helps user to recognize the shape more intuitively 
Figure 6 shows the experimental result of 8 subjects. As we have expected, 
answering correctly the shape of an object with grabbing force interaction takes less 
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time than that with point force interaction. From this result, we can say that the con- 
current feedback of grabbing force and translational force can produce more intuitive 
and effective haptic interaction than the ordinary point force feedback by only trans- 
lational force. 



3 Integrated Tactile Display Mouse System 

Tactile sensation is essential for many manipulation and exploration tasks not only in 
a real environment but also in a virtual environment. While touching or feeling the 
surface of an object with their fingers, human can perceive complex shapes and tex- 
tures through physical quantities such as pressure distribution, vibrations from slip- 
ping and stretching of the finger and temperature on the finger. Based on these tactile 
information, human can understand the features of an object and can precisely ma- 
nipulate it 

In this section, we introduce a mouse type tactile display that provides physical 
quantities related to tactile sense, for example contact force, small-scale shape, vibra- 
tion, skin stretch. Contact force is related to the recognition of shape and stiffness of 
an object and small-scale shape on the surface is represented by distributed pressure. 
Vibration is related to surface roughness and skin stretch is caused by active touch 
(rubbing) [6,13,16]. Since tactile sensation is related to several kinds of sensing ele- 
ments stated above, several preliminary studies on the tactile display have been con- 
ducted. 

There have been many researches on the tactile display. Ikei et al. designed a vi- 
brotactile device [8], Hayward and Cruz-Hernandez focused on the tactile sensation 
of lateral skin stretch [7] and suggested a tactile display device for stimulating a small 
displacement of distributed lateral skin stretching up to several kilohertz. Asamura et 
al. studied a tactile display that stimulates the skin receptors selectively [3]. However 
previous tactile display devices sometimes involve user discomfort and are too big to 
be embedded into mouse system. 

In this section, we propose a new type of tactile display and an integrated tactile 
display system that can simultaneously provide kinesthetic and tactile feedback. The 
tactile display unit that can display contact force, small-scale shape on the surface, 
normal vibration and lateral skin stretch is small enough to be embedded into the 
mouse. 



3.1 Overall Structure and Design Specifications 

Previous physiological and psychophysical studies on the human sense of touch show 
the change of sensitivity by frequency variation [11,12,17] and rubbing method [16]. 
We determined the principal requirements of a tactile display device for simulating a 
realistic sense of touch based on the literature survey. The requirements are as 
follows: 

• Normal stimulus to skin in the frequency range of 0Hz to 500Hz 

• 10 times displacement of threshold value of nerve system [13] 




142 Ki-Uk Kyung et al. 



- 60dB jam (about 1 mm) in low frequency band 

- At least 35dB |am (about 56.2 |am) in high frequency band 

• Distributed pressure for displaying the static small-scale shape 

• Sufficient ability to deform the skin’s normal direction up to 1mm 

• Small size to be embedded into PC mouse 

• Mechanical structure to provide force and tactile feedback simultaneously 

• Selectively providing an active touch mode and a passive touch mode 




Fig. 7. We use the 2 DOF translation force feedback mechanism mentioned in section 2.1 for 
an integrated tactile display system. In addition, the tactile display unit substitutes for the grab- 
bing force feedback part 



Figure 7 shows the design drawing and an implemented assembly of an integrated 
tactile display mouse that simultaneously presents the shape and the texture. The 
proposed tactile display unit is attached to the joint of two link manipulators. The 
translational forces from the five-bar mechanism describe the shape and stiffness, and 
the tactile information from the tactile display unit describes the texture. The tactile 
display unit is designed to meet the requirement stated above. More details on design 
parameters are described in [14]. 

Tactile display unit is designed to simulate the small static shape by distributed 
pressure, vibration and lateral movement across the contact region. Although the 
finger pad on the tactile display surface is immobile, it can be passively moved across 
the contact area of the skin. As a result, the hardware of the tactile display unit con- 
sists of two parts: one for normal vibrotactile stimulation and the other for lateral 
movement of the display region that triggers the skin stretch. The size of mouse body 
is approximately 80x35x60 mm and that of the tactile display unit is 50x14x18 mm. 
While user is grabbing the mouse, 2dof translational force is transmitted to mouse 
body and tactile display unit stimulates the finger pad of thumb. 

The tactile display unit is composed of two main parts. The first part is comprised 
of a pin array and eight piezoelectric bimorphs for normal vibrotactile stimulation 
(see Fig. 8). Each piezoelectric bimorph has a displacement larger than 1 mm and a 
low operating input voltage (60V). In addition, its response time is on the order of 
millisecond and it can provide force up to 0.5N. Srinivasan has studied these skin 
mechanics and found the perceivable condition of stimulus [18]. The specifications of 
the tactile stimulator with piezoelectric bimorphs were verified to meet those re- 
quirements. Hence, the tactile stimulator is adequate for normal vibrotaction satisfy- 
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ing required frequency range and stimulation intensity. We also confirmed that the 
actuator does not deflect by the normal rubbing on the surface to sense the texture. 




Fig. 8. Piezoelectric bimorphs are clamped with 1 mm spacing and the 6x1 pin array is at- 
tached to the tip of each bimorph. The pin spacing is 1 — 1.5 mm and the diameter of each pin 
is 0.5 or 0.7 mm enabling the display of a texture of 8 mm wide 

The second part is a 1 DOF translation mechanism with a small rotary motor with 
a ball screw for the lateral movement of the tactile display unit, which triggers the 
lateral skin stretch. The tactile display unit is fixed on this mechanism for linear mo- 
tion. The range of linear motion and maximum velocity is 50mm and 150mm/sec 
respectively. 



3.2 Concurrent Stimulation of Kinesthetic Force Feedback and 
Tactile Feedback 

In this section, the concurrent feedback of the kinesthetic force and the tactile display 
are investigated to verify the effectiveness of tactile feedback in transmitting the sur- 
face properties. For the objective and absolute comparison of touch feeling, we de- 
signed four specimen prototypes with different degrees of surface roughness. Figure 9 
shows the design specification of the specimens used in the experiment. 




20 mm 



< sample 1 > < sample 2 > < sample 3 > < sample 4 > 

Fig. 9. The specimens have an identical rounded shape, however each one has different surface 
properties represented by changes in the height and the wavelength. The material used in each 
of the specimens was identical to the material used in the pins of each tactile display 

Real specimen prototypes were given to participants and a simulated stimulus 
among them was presented through the integrated tactile display mouse. The stimulus 
was randomly presented twice and each participant was instructed to check the closest 
stimulus without any given chance of changing his or her answer. As shown in Fig. 
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10, the concurrent feedback of force and a tactile stimulus can more effectively simu- 
late virtual objects than force feedback only. Although, participants matched the same 
surface texture relying solely on tactile feelings from the beginning, the ratio of cor- 
rect answer is higher than 80 percent for sample 2 and 3. Kinesthetic feedback pro- 
duced a high ratio of correct answer for sample 4; this result, however, cordd be unre- 
liable because the participants were closer to the judgment that sample 4 was being 
suggested whenever they feel relatively smooth. 




Sample Number 

Fig. 10. Q Force feedback only H Force feedback + static small-scale shape display L_ Force 
feedback + static small-scale shape with rubbing Cl Force feedback + normal vibration. Four 
kinds of stimulating methods are applied. Force feedback which is generated by 2dof link 
mechanism helps user to recognize the shape and the stiffness of virtual object and tactile 
feedback which stimulates the finger pad of thumb helps user feel the roughness on the surface 
by static small-scale shape, vibration and lateral movement 



4 Application Areas 

Major drawback of current Internet shopping mall is that we cannot actually touch 
goods. If haptic information such as surface texture and weight is additionally pro- 
vided, the consumers can make a better judgment. In reality, e-commerce with haptic 
information can be partly realized. For example, the consumer logs on the virtual 
sweater shop web site and clicks on the favorite fabric. Then, the haptic information 
fde containing textures of the fabric is downloaded on the local computer with haptic 
mouse, and the consumer can feel the texture of sweater at home. 

The haptic mouse can also be applied to 3D CAD software. The user can assem- 
ble the parts while feeling the contact force. Haptic mouse helps the user to determine 
whether the parts can be assembled or not. In addition, the touch sense can contribute 
the user to determine suitable clearance of assembled parts. 
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5 Conclusion 

In this paper, we introduced two kinds of haptic mouse systems. They are optimally 
designed based on the psychophysical and physiological studies. Grabbing force 
feedback mouse can provide grabbing force as well as translational force, which help 
users to feel the shape intuitively. The integrated tactile display system can simulate 
the shape and roughness simultaneously by several stimulating methods. We verified 
the effectiveness of the proposed systems compared to previous methods. Haptic 
mouse introduced in this paper potentially can overcome the limitations of the exist- 
ing haptic interfaces by adding kinesthetic and tactile feedback to the mouse. As a 
future work, the combination of two kinds of mouse will be carried out. 
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Abstract. Developing finite state natural language processing resources 
(such as morphological lexicons) and applications (such as light-parsers) 
is also a complex software engineering enterprise which can benefit from 
additional tools that enables to developers to manage the complexity of 
the development process. We describe visual interface and a development 
environment, for developing finite state language processing applications 
using the Xerox Finite State Tool, xfst, to address some of these engineer- 
ing concerns. Vi-xfst lets a user construct complex regular expressions 
via a drag-and-drop visual interface, treating simpler regular expressions 
as “Lego Blocks.” It also enables the visualization of the topology of the 
regular expression components at different levels of granularity, enabling 
a user to easily understand and track the structural and functional rela- 
tionships among the components involved. 



1 Introduction 

Finite state machines are widely used in many natural language processing ap- 
plications to implement components such as tokenizers, morphological analyz- 
ers/generators, shallow parsers, etc. Large scale finite state language processing 
systems built using tools such as the Xerox Finite State Tool [1,2,3], van No- 
ord’s Prolog-based tool [4] or the AT&T weighted finite state machine suite [5], 
involve tens or hundreds of regular expressions which are compiled into finite 
state transducers that are interpreted by the underlying run-time engines of the 
(respective) tools. Developing such large scale finite state systems are currently 
done without much of a support for the “software engineering” aspects. Regular 
expressions are constructed manually by the developer with a text-editor and 
then compiled, and the resulting transducers are tested. Any modifications have 
to be done afterwards on the same text file(s) and the whole project has to 
be recompiled many times in a development cycle. Visualization, an important 
aid in understanding and managing the complexity of any large scale system, 
is limited to displaying the finite state machine graph (e.g., Gansner and North 
[6]). However, such visualization (sort of akin to visualizing the machine code 
of a program written in a high-level language) is not very helpful, as develop- 
ers rarely, and possibly never, think of such large systems in terms of states and 
transitions. The relationship between the regular expressions and the finite state 
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machines they are compiled into are opaque except for the simplest of regular 
expressions. Further, the size of the resulting machines, in terms of states and 
transitions, is very large, usually in the thousands to hundreds of thousands 
states, if not more, making such visualization meaningless. 

This paper describes the salient features and the use of a graphical envi- 
ronment for developing finite state natural language processing resources and 
applications. The paper is organized as follows: In the next section we provide 
an overview of the functionalities provided by the graphical environment Vi-xfst 
[7], demonstrating the functionality by running through a simple example. We 
then discuss selective compilation and finally end with some closing remarks. 

2 Vi-xfst 

Vi-xfst is developed as a visual front end to for the Xerox Finite State Tool, 
xfst [2], a sophisticated command-line-oriented interface developed by Xerox 
Research Centre Europe, for building large finite state transducers for language 
processing resources and applications. Users of xfst employ a high-level regular 
expression language which provides an extensive repertoire of high-level opera- 
tors. 1 Such regular expressions are then compiled into finite state transducers 
and interpreted by a run time engine built into the tool, xfst also provides a 
further set of commands for combining, testing and inspecting the finite state 
transducers produced by the regular expression compiler. 

To motivate the examples that will be coming later we provide here a very 
simple finite state description of a (synthetic) phonological phenomena that can 
be described by a sequence of two finite state transducers that implement the 
top-down contextual replace operations, which are then composed to give a single 
transducer. 

In this example, an abstract lexical string kaN with an abstract (nasal) 
phoneme N, is concatenated to the suffix pat. 2 As it happens, when these mor- 
phemes are concatenated, phonemes at the morpheme boundaries undergo 
changes. The N changes to an m when it is followed by a p at a morpheme 
boundary. This is captured by a replace rule 

define Rulel [ N -> m II p ] ; 

expressed in the xfst regular expression language. This rule describes finite state 
transducer which maps the symbol N in the upper string of the transducer to an 
m in the lower string, provided it is followed by a p in the upper string, that is N 
is obligatorily replaced by m when it appears in the context before p. Figure 1 
shows the traditional representation of this transducer compiled from this rule. 

For instance the lexical upper string kaNpat would be transformed into kam- 
pat as a lower string by this transducer. 

1 Details of the operators are available at http: //www . xrce .xerox . com/ competencies/ 
content-analysis/f sCompiler/f ssyntax .html and http : //www. xrce . xerox, 
com/ competencies/ content-analysis/f sCompiler/f ssynt ax-explicit .html. 

2 This example is from Beesley and Karttunen [3]. 




Developing Finite State NLP Systems with a Graphical Environment 



149 



N 




'm {m,p}:m 




Fig. 1. The transducers for the replace rule Rulel on the left and for the replace 
rule Rule2 on the right 



A second rule in this language states that a p that occurs just after an m 
gets realized as m. This is formalized as a second replace rule 

define Rule2 [ p -> m II m _ ] ; 

Thus for example when the output string kampat is applied to this transducer 
as the upper string, it would produce kammat. 

As usual with finite state transducers, the transducers for these rules can be 
composed at “compile” time to get a single transducer 

define CombinedRules Rulel 

. o . 

Rule2 ; 

We have intentionally written this combination so that Rulel is the upper rule 
and Rule2 is the bottom rule and that the output of Rulel feeds into Rule2. 
Thus when the string kaNpat is applied to the combined transducer, the output 
that is produced is kammat. Figure 2 shows the traditional representation of the 
composed transducer compiled from this combination of rules. 

Obviously this is a very simple example transducer; real finite state systems 
usually get very complex with transducers combined in various ways to compile 
into a single very large transducer. For instance, a large scale finite state pro- 
nunciation lexicon developed for Turkish [8], uses about 700 regular expressions 
in addition to various lexicons, distributed to over 115 files. The resulting finite 
state transducer has over 6 million states and 9 million transitions. 3 

A demo of this pronunciation lexicon is at http://www.hlst.sabanciuniv.edu. 
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Fig. 2. The transducer for the composed sequence of rules 



2.1 Using Vi-xfst 

Vi-xfst enables incremental construction of complex regular expressions via a 
drag-and-drop interface, treating simpler regular expressions as “Lego Blocks” . 
Vi-xfst also enables the visualization of the topology of the regular expression 
components, so that the developer can have a bird’s eye view of the overall sys- 
tem, easily understanding and tracking the relationships among the components 
involved. Since the structure of a large regular expression (built in terms of other 
regular expressions) is now transparent, the developer can interact with regular 
expressions at any level of detail, easily navigating among them for testing and 
debugging. 

When we start developing a new project with xfst the interface looks like 
as shown in Figure 3. The window on the left hand side displays the regular 
expressions already defined (none yet here since we have not defined any), the 
top right window is where a regular expression is visualized and the bottom right 
window is used for debugging and other operations. 

To start, we select an operator template from the operator palette above (or 
from the Insert menu and extend it if necessary (with more operands, condi- 
tions, etc.) This brings up a visualization template to the screen. We then enter 
the operands by either selecting from the palette of already defined regular ex- 
pressions on the left window or just directly enter a regular expression and save 
it by optionally giving it a name. For example, when Rulel above is entered, 
the top right window looks like as in Figure 4. The regular expression for Rule2 
can similarly be defined. Once these transducers are defined, we now bring in 
a template for the composition operator . o . where the layout for the operand 
transducers are positioned in vertical manner. We then select from the Defini- 
tions palette on the left, the two rules, and just drop them on their respective 
slots in the composition template. The result of this is depicted in Figure 5. At 
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Fig. 3. The initial layout of the Vi-xfst screen 
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Fig. 4. Constructing the first rule 
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Fig. 5. Constructing the combined rule 



this point one can visualize the combined rule also by fully expanding each of 
its components as shown in Figure 6. 

Although this is a very simple example, we believe it shows important aspects 
of our functionality. A visualization of a complex network employing a different 
layout of the replace rules is shown in Figure 7. Here we see a portion of a 
Number-to-English mapping network 4 where different components are visualized 
at different structural resolutions. 

4 Due to Lauri Karttunen; see http://www.cis.upenn.edu/~fjcis639/assign/ 
assign8.html for the xfst script for this transducer. It maps numbers like 1234 
into English strings like One thousand two hundred and thirty four. 




Developing Finite State NLP Systems with a Graphical Environment 



153 




Fig. 6. Fully expanded visualization of the combined regular expression 



3 Maintaining Regular Expression Dependency and 
Selective Compilation 

Selective compilation is one of the simple facilities available in many software 
development environments. A software development project uses selective com- 
pilation to compile modules that have been modified and those that depend 
(transitively) in some way (via say header file inclusion) to the modified mod- 
ules. This selective compilation scheme, typically known as the make operation, 
depends on a manually or automatically generated makefile capturing dependen- 
cies. It can save time during development as only the relevant files are recompiled 
after a set of modifications. 

In the context of developing large scale finite state language processing appli- 
cation, we encounter the same issue. During testing, we recognize that a certain 
regular expression is buggy, fix it, and then have to recompile all others that 
use that regular expression as a component. Vi-xfst provides a selective compi- 
lation functionality to address this problem by automatically keeping track of 
the regular expression level dependencies as they are built via the drag-and-drop 
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Fig. 7. Mixed visualization of a complex regular expression 
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interface. This dependency can then be exploited by Vi-xfst when a recompile 
needs to be done. 

After one or more regular expressions are modified, we first recompile (by 
sending a define command to xfst) those regular expressions, and then recom- 
pile all regular expressions starting with immediate dependents and traversing 
systematically upwards to the regular expressions of all “top” nodes on which 
no other regular expressions depend, making sure: 

— all regular expressions that a regular expression depends on and have to be 
recompiled, are recompiled before that regular expression is recompiled, and 

— every regular expression that needs to be recompiled is recompiled only once. 

To achieve these, we compute the subgraph of the dependency graph that has 
all the nodes corresponding to the modified regular expressions and any other 
regular expressions that transitively depends on these regular expressions. Then, 
a topological sort of the resulting subgraph gives a possible linear ordering of the 
regular expression compilations. For instance in the simple example, if we have 
to change the right context of Rulel, to say p t, then the transducer Rulel 
would have to be recompiled and then the CombinedRule need to be recompiled 
since it depends on Rulel. 

4 Conclusions 

We have described Vi-xfst a visual interface and a development environment for 
the development of finite state language processing resources and application 
components, using the Xerox Finite State Tool xfst. In addition to a drag-and- 
drop user interface for constructing regular expressions in a hierarchical manner, 
Vi-xfst can visualize the structure of a regular expression at different levels of 
detail. It also keeps track of how regular expressions depend on each other and 
uses this dependency information for selective compilation of regular expressions 
when one or more regular expressions are modified during development. 
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Abstract. In this study, we propose a bandwidth-aware scaling mechanism for rate 
adaptive video streaming. This mechanism involves estimation of the capacity of the 
network dynamically by measuring bottleneck bandwidth and available bandwidth 
values. By taking the available bandwidth as an upper limit, the sender adjusts its output 
rate accordingly. While increasing the quality of the video, using a bandwidth estimator 
instead of probing prevents the congestion generated by the streaming application itself. 
The results of the bandwidth-aware algorithm are compared with that of a similar 
algorithm with no bandwidth-aware scaling and the improvement is demonstrated with 
measurements taken over WAN. 



1 Introduction 

In Internet video streaming applications, compressed video streams need to be 
transmitted over networks that have varying bandwidth conditions. At any time, 
making best use of available network resources and guaranteeing maximum level of 
perceptual video quality from the end-user’s perspective require utilization of rate 
control mechanisms in video streaming systems [1], Over-rating the output of a video 
streamer can cause an undesirable traffic explosion and can lead to congestion in the 
network. On the other hand, uncontrolled reduction of the output bit rate of a video 
streamer leads to unnecessary quality degradation and inefficient use of available 
bandwidth resources. Therefore, to achieve the best trade-off between quality and 
congestion, adaptive strategies have to be developed [2, 3]. 

In many of the studies, rate adaptation algorithms are based on observed variables 
such as packet loss rate and delay. These variables provide some information about 
congestion in classical wired networks. However, particularly loss rate can not be a 
good indicator of congestion in wireless networks. On the other hand, to eliminate 
jitter, efficient receiver buffer management policies should be developed [4], Another 
important parameter that can efficiently be used is available bandwidth. A good 
estimate of available bandwidth can provide preventive congestion control. However, 
integration of a bandwidth estimation algorithm into an adaptive video streamer is not 
an easy task. Firstly, bandwidth estimation requires sending extra burst packets that 
brings a considerable overhead into the system. Secondly, these burst packets have to 
be transmitted over the path of the streamer. Finally, to meet real-time limitations of 
the streaming system, as opposed to the proposed methods in the literature, the 
bandwidth estimator should be very fast. 
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In this paper, we propose a bandwidth-aware rate control algorithm which is 
modified from our previously developed adaptive rate control algorithm that did not 
contain any bandwidth estimator [5], In [5], quality increases until loss occurs which 
means that congestion must have already started. Embedding a bandwidth estimator 
into our rate control algorithm avoids this congestion that is merely caused by our 
streamer. Another improvement is optimal selection of video quality during initial 
buffer filling period. This minimizes number of changes of video parameters which, 
in turn, improves perceived video quality. 

The paper is organized as follows: In Section 2, a brief review of our adaptation 
algorithm is given. In Section 3, the bandwidth estimator module developed for our 
algorithm is introduced. In Section 4, our bandwidth-aware scaling algorithm and 
initial quality detection method are examined in detail. In Section 5, performance 
results on WAN environment are given. Finally, in Section 6, concluding remarks are 
made. 



2 Rate Adaptive Streaming Algorithm 

In this section, we briefly review our previously developed streaming algorithm [5], 
Rate adaptation is achieved by video scaling in a seamless manner via frame dropping 
or switching to another encoding rate or changing the packet interval. The video is 
encoded in multiple encoding rates and stored in the database. A metadata file is 
prepared by packetization module for every encoding rate of the video. Packetization 
module determines the number of packets to be sent per video file for each encoding 
rate and frame discard level pair. The transmission interval between consecutive 
packets is calculated as video duration divided by the number of packets for each pair. 
Packet interval values are also stored in the metafile. 

Frame dropping is performed in levels. Considering the dependency among frame 
types, the adaptation module drops first B frames then P frames if necessary from the 
current GOP pattern. Each encoding rate (ER) and frame discard level (FDL) pair 
corresponds to a different transmission rate in the network. Encoding rates have 
values as 1000, 500, 200, 100 kbps. Frame rates have values as 30, 20, 10, 5, 1 fps. A 
grid is formed by using ER and FDL combinations. On this grid, depending on the 
receiver buffer and the network congestion status, the appropriate ER-FDL pair is 
chosen for adaptation that follows an AIMD (Additive Increase Multiplicative 
Decrease) strategy to preserve TCP-friendliness. 

Table 1. Observed and controlled variables of the algorithm 



Observed variables 


Source 


Controlled variables 


Source 


Loss rate 


RTCP RR feedback 


Encoding rate (ER) 


Sender 


ttp 


UDP receiver feedback 


GOP pattern (FDL) 


Sender 


dttp 


UDP receiver feedback 


Packet interval (PI) 


Sender 



Table 1 shows the observed and controlled variables of the rate control algorithm. 
The receiver periodically sends loss rate, current stored video duration, denoted ttp, 
and rate of change of ttp, denoted dttp, to the sender. Examining these data and 
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current values of the controlled variables ER, FDL and PI, the algorithm running at 
the sender decides on the trade off between the video quality and congestion. Further 
details can be found in [4, 5]. 

Our algorithm follows a conservative approach by allowing quality degradations 
after two adaptation requests and quality increases after 5 adaptation requests. We 
observed that a non-conservative approach that reacts to adaptation requests 
immediately resulted in frequent rate oscillations, displeasing the viewer. The 
conservative approach based on a hysteresis model preserved the prevailing quality 
until an indicator of persistent behaviour in congestion and buffer status is available, 
thereby eliminating the disturbance of the viewer. 

Another property of our algorithm is that it has a content-aware media scaling 
system. When adaptation is required, quality scaling is used by switching to a version 
at a lower encoding rate during the transmission of the video segments which contain 
high motion whereas temporal scaling (i.e. frame dropping) takes precedence over 
quality scaling during the transmission of the video portions with low motion content. 

Since there is no bandwidth estimator in [5], when all goes fine, i.e. the receiver 
buffer level is within the desired interval and there is no packet loss in the network, 
the algorithm probes for bandwidth by increasing video quality until it experiences 
packet loss. Then, it decreases quality but some packets have already been lost in the 
network. To prevent this negative effect of probing particularly at the initial buffer 
filling phase, a good estimation of the available bandwidth could be of great help. 
However, such an estimator may bring in considerable overhead to the streaming 
algorithm and it may degrade overall performance. In the next section, we will 
introduce a bandwidth estimator that is suitable for embedding into our algorithm. 



3 Estimation of Available Bandwidth 

In recent years, there has been significant progress in the area of bandwidth 
estimation. Several papers related to this area discuss capacity (i.e. bottleneck 
bandwidth) and available bandwidth estimation methodologies. Among them, Van 
Jacobson’s Pathchar [6], determines per hop based capacity by using variable packet 
sizes. Another study by Lai and Baker’s Nettimer [7], determines end-to-end path 
capacity via Packet Pairs. These two measure only bottleneck bandwidth. For our 
purposes, there is another measure called available bandwidth which provides better 
feedback for our streamer. Cprobe [8], Pathload [9], IGI [10] and Pathchirp [11] 
develop available bandwidth measurement methods. 

Cprobe is the first tool to attempt to measure available bandwidth. However, it 
ignores the fact that bottleneck bandwidth is not same as the available bandwidth. It 
also requires some privileges on routers such as ICMP messaging. Cprobe is not 
useful especially nowadays because network administrators are very concerned about 
attacks based on ICMP messages. For wide area network measurement, it is useful to 
work with other packet types. 

Pathload is based on Self Loading Periodic Streams (SLoPS) methodology. SLoPS 
consists of K packets of size L, sent to destination at constant rate R. If the stream rate 
R is higher than the available bandwidth, one way delay of successive packets at 
receiver shows an increasing trend. By trying different probing speeds, estimation for 
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the available bandwidth can be found. Pathload implementation uses UDP packets 
that require no privileges on router. Main disadvantage is reported to be low 
estimation speed [10]. 

IGI and Patchirp are the new tools that use modified Train of Packet Pairs (TOPP) 
[11] and SLoPS. They use different packet pair streams and they can achieve similar 
accuracy in small measurement times [9]. Packet Pair Technique (PPT) is a common 
approach for bandwidth estimation [12]. Since the approach basically depends on 
observation of active probing packets’ dispersion at receiver, it is also referred to as 
Packet Pair Dispersion. The active measurement probes are injected to the network by 
sender for the attention of receiver. By measuring the space changing between 
consecutive packets at destination, the network path properties can be estimated. The 
model used in PPT is given in Fig. 1 [13]. 

Our Packet Pair Model is similar to IGI/TOPP model except that we send single 
train of packet pairs in each experiment and use small constant initial gaps rather than 
increasing amount of initial gaps. This is due to the fact that determining a good value 
for the initial probing gap is much more difficult. It is possible to determine a better 
gap value by making several experiments in which a sequence of packet trains with 
increasing initial probing gaps is sent. However, these experiments take some time, 
and it may be possible to miss instant available bandwidth. Another important 
property in our algorithm is that it is 2-5 times faster than IGI method in estimating 
bandwidth. 

In each experiment, the sender injects probe packets into network at regular 
intervals denoted by ATs n and the destination receives them in intervals denoted by 
ATr n . To get more accurate results, packet pair train is formed from fixed-size packets 
represented as P. 

If there is no additional traffic on the path, bottleneck bandwidth is also called 
“path capacity’’ that is the maximum throughput that the path can provide. We denote 
it as B_BW. Bottleneck bandwidth should not be confused with the available 
bandwidth of a path. Available bandwidth is the redundant capacity that is not used by 
the existing traffic. 

We assume that there is no cross traffic in the path and the complete path is formed 
of K links. We measure the smallest bottleneck bandwidth of the present links [12]: 

BBW = min i=0 ...K-i {B BW,} . (1) 

Amount of competing traffic ratio represented by a is the ratio of increased gaps to 
received gaps. Motivated by [10], the competing traffic represented by C_BW is 
given as 



C_BW = B BW * a . (2) 

Finally, we estimate the available bandwidth by subtracting competing traffic 
throughput from bottleneck bandwidth, 

A BW = B BW - C BW . 



( 3 ) 
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Fig. 1 . Packet Pair Model without any competing traffic 



4 A Bandwidth-Aware Streaming Algorithm 

We developed and implemented a bandwidth-aware streaming algorithm that can 
efficiently operate on both local and wide area networks. This algorithm is integrated 
to our previously developed rate adaptive algorithm that utilizes the loss statistics and 
receiver buffer status. 

First, the channel bandwidth estimator module is implemented stand-alone and its 
performance is optimized to meet our streaming environment requirements. Then the 
module is embedded into the rate adaptive streaming algorithm. Combining available 
bandwidth estimation with rate adaptation has two important advantages. First, 
available bandwidth is used to decide on the initial video quality to be sent during pre- 
buffering period. The estimated bandwidth value lets the algorithm choose the most 
appropriate encoding rate to start streaming. Hence, heavy load during initial buffer 
filling period is avoided. Second, when the quality is to be increased, available 
bandwidth is very useful to understand whether the current channel capacity meets the 
new bandwidth requirement. By this way, the unnecessary quality increases in the 
adaptation algorithm, which may cause packet loss and congestion, are avoided. 

Depending on the observed measures of loss rate, ttp, dttp, video dynamics and 
current values of ER-FDL-PI, the algorithm determines new values for ER-FDL-PI. If 
conditions impose a decrease in quality, than no further step is taken and new values 
of ER-FDL-PI are applied to streaming. Since RTCP reports are at least as fast as our 
bandwidth estimator in informing the congestion to the sender, the estimated 
bandwidth value is not used when the quality is to be decreased. Hence, no advance 
information is available in this particular case. 

If new adaptation is not in the direction of decreasing data rate, than new proposed 
data rate ( putbw ) is compared with estimated available data rate. If putbw is less 
than the available bandwidth, then the chosen ER-FDL-PI is applied to streaming. If 
put bw is more than available bandwidth, the current ER-FDL-PI values remain the 
same. This last case occurs when the streaming application is already using up a hefty 
bandwidth and there is no more bandwidth available for quality increase. A sketch of 
the algorithm is given in Fig. 2 where the procedures down_scale_video and 
up scale video correspond to the original adaptation algorithm [5] with rate decrease 
and increase respectively. 
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It is important to note that the bandwidth estimator module is run at the receiver and it 
operates in full synchronization with other modules of the algorithm. In particular, the 
observation parameters and estimated available bandwidth are updated every five 
seconds and the decision process at the sender is not delayed by the bandwidth 
estimator module. 



bandwidth_aware_scale( ) 

{ 

observe lossrate, ttp, dttp, video dynamics, available _bw; 
if ( congestion ) 

down_scale_video(loss_rate, ttp, dttp, video dynamics)', 

else 

compute putbw; 

if ( put_bw < available_bw) 

up_scale_video(ttp, dttp, video dynamics)', 
else 

no_scale; 



Fig. 2. Bandwidth-aware video scaling algorithm 



5 Experimental Results 

The bandwidth-aware scaling algorithm introduced in the previous section has been 
tested with the video streaming system that we have implemented. In the system, RTP 
has been used for data transfer and RTCP has been used to collect network statistics. 
Control messages are exchanged over UDP. Our streaming system has client-server 
architecture. The clients request video from the server. Server streams video to the 
client in a unicast manner. Both the server and client software are multithreaded. 
Pipelined architecture of the client software further increases the performance of the 
whole system. Detailed explanation of our testbed can be found in [4, 5]. 

Experiments have been performed in the actual Internet environment between two 
Sun Ultra 5 workstations. The workstation which acts as the streaming server is 
located in Koc University Campus in Istanbul. The client workstation is located in 
Ege University Campus. Tmceroute command shows that the number of hops 
between the two workstations is 9. 

We carried out two sets of experiments to observe the performance of our 
bandwidth-aware scaling algorithm. The first set of experiments was carried out late 
at night when the amount of traffic on the network was low, while the second set of 
experiments was conducted during busy working hours when the load was higher. In 
both sets of experiments, to be able to evaluate the benefits of our bandwidth-aware 
scaling algorithm, we collected performance results under two configurations. In the 
first configuration, bandwidth-aware scaling is turned off, while in the other, it is 
turned on. 

Results presented in Fig. 3 and Fig. 4 which belong to an example in the first set of 
experiments. Figure 3 shows system variables when adaptation is performed without 
bandwidth considerations. As seen in Fig. 3.b, during pre -buffering period, 
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transmitted video quality can reach its highest level in which the encoding rate is 
1000 kbps (ER = 0) and the frame rate is 30 fps (FDL = 0). However, the estimated 
available bandwidth is around 550 kbps. Therefore, our system overloads the 
network, available bandwidth decreases sharply and congestion develops soon after 
streaming begins. Our adaptation module decreases video quality by following our 
content-aware scaling algorithm. As the video quality decreases, congestion alleviates 
and available bandwidth increases. As a consequence, the adaptation module 
increases video quality by adjusting the encoding rate and frame rate. At sample=35, 
encoding rate rises to 1000 kbps. This bit rate overloads the network and congestion 
develops again. In response, the adaptation module decreases video quality again. 
This cycle repeats itself throughout the transmission, generating a self-similar traffic 
in the network. Frequent quality switches have been observed as the streaming 
proceeds. 
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Fig. 3. System variables in lightly loaded network without bandwidth aware scaling 




smoothed loss % A_BW 




ER FDL 



(a) (b) 

Fig. 4. System variables in lightly loaded network with bandwidth aware scaling. 

Fig. 4 shows system variables when our bandwidth-aware scaling algorithm is 
applied. Prior to streaming, initial network bandwidth is estimated and proper quality 
level is determined by comparing available bandwidth with encoding rates of pre- 
encoded streams. According to Fig. 4a, available bandwidth is between 500 and 600 
kbps and the highest encoding rate that is closest to the available bandwidth is 500 
kbps. Therefore, transmission starts with this encoding rate (500 kbps) with a frame 
rate of 30 fps (FDL = 0). Before switching to a higher quality video, our streaming 
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system calculates the bit rate requirement and compares this value with the available 
bandwidth. Since the available bandwidth does not exceed 600 kbps, we observed that 
the adaptation module does not increase video quality. Since this experiment is 
conducted late at night when the Internet is lightly loaded and our system transmits 
video packets at a rate in compliance with the available bandwidth, no loss has been 
conveyed through RTCP reports and transmission proceeds at the initial video quality 
throughout the experiment, without invoking the adaptation module. At sample=120, 
buffer level has fallen below a predetermined threshold. The decrease in buffer level 
has been compensated by decreasing the video quality to 200 kbps. Low quality video 
results in less number of packets in the network leading to fewer packets with smaller 
end-to-end delays which increases input rate to the buffer, increasing the buffer 
occupancy in turn. 




Fig. 5. System variables in periodically loaded network without bandwidth aware scaling. 




(a) 



(b) 



Fig. 6. System variables in periodically loaded network with bandwidth aware scaling. 



Figure 5 and Fig. 6 demonstrate the behavior of our system in a periodically loaded 
network. When these figures are examined, it is seen that the available bandwidth has 
more variation than it had in lightly loaded network scenarios. This is because of the 
fact that, compared to the lightly loaded case, the competing traffic on the Internet is 
varying considerably in this case. Figure 5 shows the status of system variables when 
the bandwidth-aware scaling algorithm is switched off. Behavior of the system is very 
similar to the case given in Fig. 3. Figure 6 presents the performance results when 
bandwidth-aware scaling is applied. We observed that the number of congested 
intervals is reduced when bandwidth is taken into account in rate control decisions. 
When quality changes given in Fig. 6.b are examined, it is seen that congestion is 
caused by the traffic generated by the other sources on the Internet rather than by our 
streaming software. This is best observed during the initial buffer filling periods. 
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Upon detecting the congestion, our adaptation module reacted properly by decreasing 
video quality. When Fig. 5.b and Fig. 6.b are compared, it is seen that inclusion of the 
bandwidth estimation module decreased the number of quality changes, resulting in 
more stable video quality settings. Additionally, overall quality was higher when 
bandwidth aware algorithm was applied. For example, the encoding rate was 
decreased to the worst level (ER = 100 kbps) at sample = 6 in Fig. 5.b. On the other 
hand, prevailing encoding rate was 500 kbps until sample = 33 in Fig. 6.b, thanks to 
the employment of initial bandwidth estimation procedure. 

To summarize, experimental results justify our hypothesis that bandwidth 
awareness may prevent the occurrence of congestion due to probing and reduces the 
effects of congestion due to the traffic generated by the external sources. It also 
reduces the quality fluctuations by avoiding unnecessary invocations of the adaptation 
module. Similarly, initial bandwidth detection eliminated congestions at the start of 
the streaming process. Finally, overall perceptual quality was higher with bandwidth- 
awareness, resulting in a positive effect on the disturbance of the viewer. 



6 Conclusions 

In this study, we developed a bandwidth-aware streaming algorithm and reported 
performance results. A mechanism is proposed to measure available bandwidth in the 
network. By using Pair Packet Dispersion technique, the channel bandwidth is 
estimated. Our adaptive streaming system checks available bandwidth before 
performing any increase in quality and/or transmission rate. We have particularly 
observed that our algorithm is effective during initial buffer filling period. We 
compared test results of the algorithms with and without bandwidth estimator, which 
are both taken in WAN environment. It has been shown that the bandwidth-aware 
streaming algorithm does not allow packet loss and congestion due to the quality 
increase. Flowever, it can robustly react to the congestion caused by other factors 
while maintaining acceptable and interrupt-free perceptual quality. 
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Abstract. Recently various hacking tools that dig out the vulnerabil- 
ities of hosts on a network are easily accessible through the Internet, 
and large-scale network attacks have been increased during last decades. 
To detect these network attacks, a variety of schemes are developed by 
several companies and organizations. But, many previous schemes are 
known to have some weakness in protecting important hosts from net- 
work attacks because of their inherent unconditional filtering mecha- 
nisms. This means that they unconditionally filter out doubtful network 
traffic in spite of its being normal network traffic. Therefore, to filter 
out only abnormal network traffic, step-by-step filtering capabilities are 
required. In this paper, we propose a framework for controlling abnormal 
network traffic. This framework is implemented through the CBQ and 
iptables mechanisms in Linux kernel. 



1 Introduction 

Along with the widespread use of the Internet, abnormal network traffic that 
wastes useful network bandwidth has been increased considerably [10]. Many 
schemes for detecting network attacks have been developed, but they are known 
to have some weakness in protecting hosts from network attacks because they 
have undesirable features such as an unconditional filtering mechanism. This 
means that they unconditionally filter out doubtful network traffic in spite of 
its being normal network traffic. Therefore, to filter out only abnormal network 
traffic, more active responding capabilities are required. 

In this paper, we propose a framework for controlling abnormal network 
traffic. This framework was implemented in the Linux-based routing system, and 
it partially limits the network bandwidth against the alerted network traffic by 
the CBQ mechanism and drops the attack traffic by the iptables mechanism [6,3]. 

The rest of the paper is organized as follows. Section 2 outlines background 
information and mechanisms that are used by our framework. Section 3 describes 
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the overall architecture of our framework and the detailed algorithms of each 
component in the framework. Section 4 describes in detail how our framework 
works and discusses the performance comparison with other security systems. 
Section 5 concludes our proposed scheme and outlines future work. 



2 Background 

2.1 Netfilter 

Netfilter can be regarded as a set of hooks inside the Linux kernel 2.4.x’s net- 
work stack that allows kernel modules to register callback functions that should 
be called whenever a network packet traverses one of those hooks [4]. In our 
proposed framework, the specific functions that are required to disassemble and 
analyze packets are registered at the appropriate hooks. All network packets are 
analyzed with these functions before they can continue their traversals or can 
be dropped according to the results. The hooks of netfilter are illustrated in 
Figure 1. 




2.2 Kernel Module and Kernel Thread 

The kernel module programming is the technology that can be used to support 
various hardware devices in Linux systems [11]. According to the needs of device 
drivers or file systems, the corresponding module can dynamically be inserted 
or removed without the modification of the kernel and system shutdown. 

The kernel thread concept can be implemented in the kernel module. When 
it is adopted in the kernel module programming, the kernel module can have 
many useful features that the application-level processes have [1,2]. Accordingly, 
once a program is appropriately implemented as a kernel module with the kernel 
thread concept, it can improve the system performance, increase the execution 
speed, and simplify the programming process. 
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2.3 The Iptables and the CBQ Mechanisms 

The iptables is a generic table structure for the definition of IP packet filter- 
ing rule sets. Each rule within an IP table consists of a number of classifiers 
(matches) and one connected action (target) [4]. Packets are filtered after they 
are compared with the rules in an IP table. The CBQ is one of the classful 
qdiscs (queueing disciplines) that are very useful when there are different kinds 
of traffic which should have different treatments [5]. With this mechanism, spe- 
cific network traffic can be prevented from monopolizing the whole network 
bandwidth. 

In this paper, when the abnormal network traffic is detected, the traffic can 
be placed on a special output queue that the network bandwidth of it is restricted 
by the CBQ mechanism or can be dropped by the iptables mechanism. 

3 Implementation 

We describe the system architecture of our framework and the detailed algo- 
rithms of the proposed anomaly traffic control system. The overall architecture 
of our framework and the description of each component are shown in Figure 2. 




Fig. 2. The architecture of the framework 



3.1 Network Layer Module 

The network layer module consists of the PF (Packet Filtering) module and the 
QA(Queue Assignment) module. The PF module, with the iptables mechanism, 
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drops the packets that are matched to the filtering rules. The QA module, with 
the CBQ mechanism, places network traffic to the output queues according to the 
queuing rules. When network traffic is alerted as an abnormal traffic by the IA 
(Intrusion Analysis) module, the QA module can restrict the output bandwidth 
of the traffic. Also, when network traffic is matched to filtering rules or when 
alerted network traffic is confirmed as attack traffic by the IA module, the PF 
module can drop it with filtering rules. The filtering and queuing rules consist 
of S (source address), D (destination address), P s (the port of source address), 
and Pd (the port of destination address). 

The filtering rules can be adopted at (1) NFJP_PRE_R.OUTING hook, and 
the queuing rules can be adopted at (4) NF_IP_POST_R.OUTING hook as shown 
in Figure 1. These rules can be made automatically by the IP module or manually 
by the network security manager. 



3.2 Packet Analysis Module 

The PA (Packet Analysis) module disassembles the header of the network packets 
passing the PF module and queues them for IA module. The algorithm for the 
PF/PA module is illustrated in Figure 3. 



/* Ri e : ingress/egress filtering rules */ 

/* Ript: filtering rules made by IP modules or the security manager */ 

/* Mt C p={F,S,D,P s ,Pd}, M udp ={S,D,P s ,P d }, M icmp ={S,D,T,C} */ 

/* When a packet P arrives, it is hooked at the NF_IP_PRE_ROUTING position */ 

P is passed to the PF module and the procedure OP1 is processed; 
if (P successfully passes PF module) { 

P is passed to the PA module and the procedure OP2 is processed; 

P is passed to the upper protocol layer or network interfaces; 

} else { P is dropped; } 

Procedure OP1: 

/* P is checked whether its pattern matches to the filtering rules */ 
if ((the pattern of P) £ Rie) { P is dropped; } 
if ((the pattern of P) £ Ript) { P is dropped; } 

P is passed to the upper protocol layer or network interfaces; 

Procedure OP2: 

/* When PA module receives network packet P that successfully passed the PF module */ 
switch(the protocol of P) { 

case TCP: sends M* C p to the Qtcp in the IA module; break; 

case UDP: sends M. u dp to the Q u dp in the IA module; break; 

case ICMP: sends Mi cmp to the Q icmp in sthe I A module; break; 

} 



Fig. 3. The algorithm of the PF /PA module 
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When network packets that normally passed through the PF module arrive 
at the PA module, the headers of them are disassembled and queued for IA 
module. The message format of the disassembled packet headers are M tcp , M u d p , 
and Mi cmp for TCP, UDP, and ICMP, respectively. The meaning of each M’s 
component is illustrated in Table 1. 



Table 1. The meaning of each M’s component 



Component 


Meaning 


F 


URG, PSH, SYN, FIN, ACK, RST 


S (or D) 


Source (Destination) IP address of the packet 


P s (or P d ) 


Source (Destination) port number of the packet 


T (or C) 


ICMP message type (code) of the packet 



The messages such as M tcp , M u d p , and M icmp and the disassembled packet 
headers are queued into the queues Q t. cp , Qudp, and Q i cmp of the IA module, 
respectively. Afterwards, the information in each queue is used for intrusion 
analysis. 



3.3 Intrusion Analysis and Prevention Module 

The intrusion analysis and prevention module consists of the IA module and the 
IP (Intrusion Prevention) module. The IA module analyzes the disassembled 
packet headers in the queues, and alerts them as abnormal network traffic or 
detects attack traffic. Because the IA module is registered in the kernel timer, 
this module periodically reads the messages in the queues, analyzes them, and 
updates the state information of the network traffic. On the basis of the statistical 
method, the state information is used for judging abnormal or attack network 
traffic with the thresholds and variances that must be already defined in the 
configuration file. This module generally analyzes the messages at the predefined 
time T, and it also analyzes the state information at the time 4T, 16T, 64T, 
128T, and 256T in order to detect the traffic that continuously tries to attack 
a host for a long time [7,8,9]. The analysis result of the IA module is passed 
to the IP module, the IP module sets up filtering or queuing rules according to 
the results, and those rules are applied to the QA module. Additionally, the IA 
module can accept the intrusion detection information from external security 
systems through the configuration module. This information is sent to the IP 
module and is used for filtering attack traffic. 

Accordingly, when normal network traffic using the normal queue is alerted 
as abnormal traffic, the traffic is allocated to the alert queue by queuing rules. 
When alerted network traffic using the alert queue is detected as attack traffic 
or when normal traffic is matched to the filtering rules, the traffic is dropped 
by the filtering rules. Figure 4 shows the treatment of network traffic by the 
filtering and queuing rules. 
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Fig. 4. The treatment of network traffic by filtering and queuing rules 



3.4 Configuration and log Modules 

The configuration module can accept the intrusion analysis information from 
the external security systems to detect a variety of new attacks, can check the 
sanity of the information, and can directly pass them to the I A module. The 
log module logs the information on the filtering and queuing rules made by IP 
module. A network security manager can monitor the system of our framework 
and configure the filtering rules with the web-based user interface. 

4 Test and Performance Comparison 

In order to test the functionality of our framework, we adopt the framework into 
the Linux-based routing system, mclab, and monitors the packets from attackers 
(192. 168.x. 2) to the target network (192.168.1.0). Our testbed is illustrated in 
Figure 5. As illustrated in Figure 6, the CBQ configuration adopted in the QA 
module is set up in mclab. 

The output results generated at the mclab are looked into after artificial 
attack traffic is generated with some attacking tools (knocker, tfn2k). We used 
output messages recorded in ’/var/log/messages’ of the mclab. In Figure 5, when 
network traffic comes from attacker-1 to the target network, it passes through 
the normal queue of the mclab and can use the maximum 90% output bandwidth 
because the rest bandwidth (10%) is allocated for reserved network traffic. When 
abnormal network traffic comes from attacker-1 to the target network, the mclab 
analyzes the traffic and alerts the traffic as abnormal network traffic. This alerted 
network traffic is passed to the alert queue of the mclab. When alerted network 
traffic is alerted by the mclab again, it is dropped. When attack network traffic 
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Reserved queue (ID : 1:1) 
Allocated Bandwidth 
: 10 - 100 % 



Normal queue (ID : 1:2) 
Allocated Bandwidth 
: 0 - 90 % 

Alert queue (ID : 1:3) 
Allocated Bandwidth 
: 0 - 10 % 



Fig. 5. Testbed 



Fig. 6. The CBQ configuration 



matched to the filtering rules in mclab comes from attacker-1, it can be directly 
dropped. This step-by-step response against abnormal network traffic with the 
CBQ and iptables mechanisms makes it possible to decrease false positive rate 
and to dynamically control the network traffic. Figure 7 shows the detection 
process of the DoS attack and we can see that the attack traffic is dropped. 
Figure 8 shows the state information of three queues that the output bandwidth 
of the system is divided into by the CBQ mechanism. 




Fig. 7. The detection of the DoS attack 



Table 2 shows the performance comparison with NIDS, NIPS (Network-based 
Intrusion Prevention System), and NTMS (Network Traffic Management Sys- 
tem) from the viewpoint of some performance aspects such as intrusion detec- 
tion capabilities, traffic control capabilities, false positive rates, and so on. NIDS 
offers the high detection rate of attack traffic because it is developed in order to 
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roo t@mclat>;/ 

aisi(£) ae(£> a?i(v> eioi«(x> ?t?i(G) esskh) 

[root@nclab / ]# tc -s class show dev eth3 
class cbq l: root rate lOOIvbl t (bounded, I sol ated) 
Sent 72700882 bytes 56271 pkts (dropped 0, overl 
bor rofiec 0 overact ions 0 aval die 0 undertime 0 



pr i o no— transml t 
ml ts 0) 



class cbq 1:1 parent 1: rate lOfvfcit (Isolated) prlo no— transmit 
Sent 0 bytes 0 pk ts (dropped 0. overl Imlts 0) 
borrowed 0 overactions 0 avgldle 9 undertime 0 
class cbq 1:2 parent 1: rate 80M>tt prlo no— transmit 
Sent 28014436 bytes 21683 pkts (dropped 0. overl Imlts 0) 
borrowed 0 overactlons 0 avgldle 0 undertime —1996 
class cbq 1:3 parent 1: rate lOlvbl t (bounded) prio no— transmit 
Sent 44686404 bytes 34587 pkts (dropped 0, overl Imlts 0) 
borrowed 0 overactlons 0 avgldle 9 undertime 0 



L r oo t!s»nc lab / ] p |J 



Fig. 8. The state information of the three output queues 



detect a variety of attacks, but the system may be passive to intrusion because it 
focuses only on intrusion detection. NIPS offers an active prevention mechanism, 
but it incurs high packet loss eventually increasing false positive rate. In general, 
NTMS offers a QoS mechanism to spread network traffic load. In the security 
area, NTMS can control network traffic that seems to be malicious by allocating 
limited bandwidth to the network traffic, but it has no intrusion detection capa- 
bilities. Our proposed framework adopts the strength of NTMS and overcomes 
the weakness of it by implementing the I A module. This framework can decrease 
the false positive rate and can control the network traffic dynamically. 



Table 2. The performance comparison with NIDS, NIPS, and NTMS 



Section 


NIDS 


NIPS 


NTMS 


Our framework 


Intrusion 

detection 


High 


High 


No 


High 


Attack 

traffic 

control 


No 


Medium 


High 


High 


False posi- 
tive rate 


No 


High 


Medium 


Medium 


Strength 


Detection of vari- 
ous attack traffic 


Dynamic preven- 
tion of attack 
traffic 


Various traffic 

control policies 


Detection of vari- 
ous attack traffic 
and dynamic traf- 
fic control policies 


Weakness 


No network traffic 
control 


Packet loss and 
high false positive 
rate 


No intrusion de- 
tection 


Portable only on 
the Linux-based 
routing system 
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5 Conclusion and Future Work 

In this paper, we proposed an anomaly traffic control framework based on Linux 
netfilter and CBQ routing mechanisms. The concepts of kernel module and ker- 
nel thread are used for real-time detection of abnormal network traffic at the 
kernel level rather than at the application level, and the iptables and CBQ mech- 
anisms are used for step-by-step filtering in order to decrease false positive rate. 
To detect a variety of new abnormal network traffic, the intrusion detection 
information from external security systems can be accepted through the con- 
figuration module. This information is sent to the IP module and is used for 
filtering attack traffic. 

In the future, performance evaluation and analysis will be performed in order 
to prove that our framework has the capability of faster detection and more real- 
time response than other systems implemented on the application level, and that 
the mechanism of step-by-step filtering is very effective in controlling abnormal 
network traffic. The performance evaluation will be performed to compare the 
performance of the pre-existing security systems with our proposed framework. 
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Abstract. We propose a proximity-based overlay routing algorithm for 
accelerating the service discovery using the distributed hash table (DHT)- 
based peer-to-peer (P2P) overlay in mobile ad hoc networks (MANET). 
DHT systems are useful for resource restrained MANET devices due to 
the relatively low communication overhead. However, the overlay routing 
for service discovery is very inefficient, since overlay structure is indepen- 
dent of physical network topology. Our proximity-based overlay routing 
utilizes the wireless characteristics of MANETs. The physically closer 
node to destination is preferred to the logical neighbors, using informa- 
tion collected by 1-hop broadcast between a node and its physical neigh- 
bors. More enhancement is achieved by the shortcut generated from the 
service description duplication. In a detailed ns-2 simulation study, we 
show that the proposed scheme achieves approximately shortest physical 
path without additional communication overhead, and also find that it 
works well in the mobile environment. 



1 Introduction 

Mobile ad hoc network (MANET) is a network of self-organized wireless mobile 
hosts, which is formed without help of the fixed infrastructure. Service discovery 
protocol (SDP) enables services (which may be network devices, applications 
or resources) to advertise themselves, and clients to issue queries and discover 
appropriate services needed to properly complete specified tasks. SDP is an im- 
portant and challengeable component in the application layer for ad hoc commu- 
nication and collaboration, because the MANET does not assume any central- 
ized or publicly known fixed server, and the network topology incessantly varies 
during the lifetime. SDP for MANETs are usually based on the network-wide 
broadcasting and matching algorithms, and some improving mechanisms, such 
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as caching and grouping, have been developed because the broadcasting cost is 
very high when the size of network gets larger. 

On the other hand, peer-to-peer (P2P) overlay networks, which have been 
proposed for the mutual information exchange among Internet users, are very 
similar to the MANET in the nature of self-organized and distributed networks. 
Early stage P2P protocols were based on the all-to-all broadcasting like SDPs in 
MANETs, but they suffered from the data redundancy, large amount of traffic, 
and so on. Now, the structured approach using distributed hash table (DHT) 
is gaining the popularity in the research community, and the P2P strategy has 
been being accepted as a novel technique for distributed resource management. 
We consider the structured P2P protocols as a good candidate for the SDP for 
MANETs to alleviate the redundancy and performance degradation. 

On applying the DHT-based system to the MANETs, there are some diffi- 
culties caused by MANET’s dynamic property, resource restraints, and wireless 
characteristics. Of those problems, we focus on the routing inefficiency of service 
query and retrieval process. The DHT overlay structure and its routing algo- 
rithm don’t take the physical topology into account at all, so the shortest logical 
path may be a longest physical path. Some topology-aware DHT routing algo- 
rithms [1] have been proposed, but they are not applicable for MANETs because 
they usually exploit the fixed Internet topology information. 

In this paper, we propose an overlay routing scheme on the DHT-based over- 
lay in MANETs to reduce the length of physical routing path using the dynamic 
feature and wireless characteristics. 

2 Related Work 

Service discovery protocols for distributed environments have been mainly de- 
signed for the static networks including Internet, and they usually adopted the 
centralized directory-based approach as shown from SLP (Service Location Pro- 
tocol), UDDI (Universal Description, Discovery & Integration), and Jini. In 
MANETs, every node including server nodes and directory nodes has possi- 
bility to change its location and the state, so the assumption for the designated 
centralized component is not appropriate or reasonable. 

The simplest approach for the distributed service discovery is flooding-based 
algorithm, such as SSDP (Simple Service Discovery Protocol) of UPnP (Uni- 
versal Plug and Play) and JXTA. Every node advertises its own services to all 
other nodes periodically (push model), or every query message floods into the 
network whenever a service is requested (pull model). The push model suffers 
from the requirement for large storage and outdated advertisements, and the pull 
model has to pay for the excessive communication overhead. These drawbacks 
are more problematic in MANETs because devices in MANETs usually have the 
limitations in the storage, computing and communication capability, and power 
capacity. 

Some works have been proposed to reduce the storage or communication 
overhead by compromising the pull and push model. One approach is caching 
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and the other is grouping. In caching approaches [2,3], some intermediate nodes 
satisfying specific conditions store the service advertisements in their storage. 
The request message outside the range arrives at a cache node, the service de- 
scription can be returned to the requester without reaching the service providers. 
Lanes [4] follows the grouping approach under the assumption of IPv6 address- 
ing. A group of nodes forms a Lane, and the members of a lane share their service 
descriptions and have the same anycast address. Search message floods from a 
Lane to another Lane by anycasting. GSD [5] is another grouping method using 
ontology information of service description. 

On the other hand, P2P networking mechanism, which has been developed for 
the mutual information exchange among Internet users, is emerging as a MANET 
SDP or resource sharing method because it is very similar to the MANET in the 
nature of self-organized, distributed, and ever-changing topology. Early stage 
P2P protocols based on the directory server approach (e.g. Napster) or all-to-all 
broadcasting (e.g. Gnutella) also have the resemblance and the same problems 
with the SDPs for MANET. 

The DHT-based P2P system constructs an application level overlay network 
on top of the physical network. Each resource or service is associated with a key, 
and each DHT node is responsible for a certain range of keys (zone) . The basic 
function of DHT is efficient key lookups, thus to route the query to the node that 
stores the corresponding resource of a key. This involves the construction and 
maintenance of topology among peers, routing in the overlay network. Several 
topologies such as ring (e.g. Chord), hypercube (e.g. CAN), and tree (e.g. Pas- 
try, Tapestry) and the overlay routing algorithm based on each topology were 
proposed. Most of them maintain the information of 0(log N) neighbors and log- 
ical route length of 0(log N) hops for N participating peers. Participating nodes 
periodically exchange control messages with the neighbors and keep the overlay 
routing table up-to-date. When a node joins or departs the overlay network, 
some special process should be performed to maintain the structure in order to 
support full virtual key space. A query message travels the route established on 
the logical structure, though each overlay hop may consist of a number of hops 
in underlying IP network. 

There have been some previous attempts to use DHT-based overlay for re- 
source sharing in MANETs [4, 7, 8, 9], paying attention to the similarities of two 
networks - both are self-organizing and decentralized. [7] proposed an integrated 
architecture for P2P and mobile applications, and investigated the expected 
challenges. They pointed out the logical versus physical routing is a critical is- 
sue in the integration. In [4] and [8], the cost for maintaining the strict overlay 
structure is said as the biggest obstacle, because of highly dynamic feature and 
resource limited devices. [4] weakened the condition of overlay structure, and [8] 
proposed an on-demand structuring and routing. [9] proposed an IP layer rout- 
ing protocol integrating a representative MANET routing algorithm DSR and 
an application layer P2P protocol, Pastry. 
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3 Proximity-Based Overlay Routing 

Distributed service trading process in MANETs consists of service advertisement 
and discovery. Each node advertises its own services to be shared to the other 
nodes by sending messages with service descriptions, and the client nodes flow 
service request messages into the network. The nodes that own the matching ser- 
vice or store the advertised service descriptions reply with the requested service 
descriptions, so that the client node can access the server node. 

In DHT-based P2P overlay networks, the basic functions are join, leave, in- 
sert, update, and lookup as described in Sec. 2. Our DHT-based service discovery 
follows the same process and rules as the given overlay structure except a few dif- 
ferences. Main resource to be shared is not service itself, but service description 
(SD). Each SD generates a key by hashing, and each node stores and maintains 
the (key, SD) pairs as well as the overlay routing table. Service advertisement 
is performed by the insertion of (key, SD) to the node that takes charge of a 
zone. Participating nodes exchange routing information periodically, and keep 
the routing table updated just as the way of overlay maintenance. When a ser- 
vice is requested, the overlay routing table is referred and the requesting message 
travels along the route constructed from the overlay routing table. Service dis- 
covery is performed by the lookup process on the overlay. 

The logical overlay routing does not consider the physical topology at all, so 
unnecessarily longer paths often introduce the inefficiency and waste of limited 
resource of MANET devices. We propose a scheme to shorten the route utilizing 
the wireless characteristics of MANETs. 

In the wired networks, only a part of nodes joins in the P2P overlay networks 
and the overlay construction and routing are performed among the participating 
peers only, while the underlying IP networking enables the real communication. 
However, every node which dwells in a MANET takes part in the overlay network, 
we assume, because each node in a MANET functions as a router as well as a 
working peer. Service description language and the hashing mechanism are not 
described because they are beyond the scope of this paper. 

3.1 Consult Physical Neighbors 

The first modification of overlay routing uses information from physical neigh- 
bors. The wireless devices in MANETs have limited radio range, so only the 
devices within the transmission range can hear the messages. Before forwarding 
a service discovery request message, a node sends a request for neighbor infor- 
mation to its own physical neighbors by 1-hop broadcast. The neighbors look 
up their routing table, compute the logical distance between the destination and 
each entry in the routing table, and reply with the information on the closest 
node and the distance. If one of the neighbors is responsible for the requested 
key, then it replies with its own address and the distance 0. The requester col- 
lects the replies for certain period, and selects the node that declares the shortest 
distance. That is, each node considers the physical neighbors’ routing informa- 
tion prior to its own logical neighbors, so increases the possibility of choosing 
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the physically shorter routes. We call this scheme as CPN (Consult Physical 
Neighbors) . 

Figure 1 shows a very simple example. The distance is computed based on 
the structure of key space, and a threshold is set to prevent too many replies in 
the case that the node density is high. In this example, the threshold value is 0.2. 
Though node B and E send the same response, E is chosen because the packet 
from E contains E as a source address. If the CPN is not used, the physical path 
length is 3 (A — ■> B — > A — > E). 




(b) 




close_node_request (service, hash-key, distance) 
close_node_response (address, distance) 
service_discovery_request (service, hash-key, source) 



service_discovery_request (MP3, (0.8, 0.8), A) 

(c) 



Fig. 1. Use of physical neighbors’ routing information 



3.2 Use Duplicated Description 

The second modification utilizes the fact that the service descriptions are du- 
plicated and stored in the node in charge of the key zone as well as the service 
provider. In addition to the advertised (key, SD) pairs ( Ad_SD_table ), each node 
maintains the (key, SD) directory of its own services ( My_SD_table ). When a 
SD request message arrives, the node looks up the My_SD_table first. So, the 
requester can be replied by the physically closer SD of the duplicated ones (see 
Fig. 2). We call this scheme as UDD (Use Duplicated Description). UDD can 
be used with the usual overlay routing, but more effective with CPN. In usual 
overlay structure of N nodes, the possibility to meet the service provider on the 
route to the designated key space is 1 /N', where N' is N— (number of predeces- 
sors). In CPN routing, all physical neighbors are checked before forwarding so 



Proximity-Based Overlay Routing 181 



the possibility goes up to k/N", where N" is N — (total number of predecessors 
and their neighbors) and k is the number of current neighbors. The higher node 
density gives the higher possibility. 
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Fig. 2. Shortcut to physically close service descriptions 



Integrating both acceleration methods, each node performs following algo- 
rithms as a sender or a receiver. 

void retrieveService (char *serviceRequest) { 

key = makeKey (serviceRequest) ; 

if (zone . IsContain(key) I I lookupMyService (serviceRequest) ) { 
retrievalSuccess () ; 

]- else if (neighbor = zone . isNeighborContain(key) >= 0) { 
forwardTo (neighbor, msg) ; 

]- else { 

logicalDistance = zone. distance (key ) ; 

sendCloseNodeRequest (serviceRequest , key, logicalDistance); 
setBroadcastTimer () ; 

} 



> 
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void broadcastTimerExpire (Event *) { 

closest = selectCloseNodeO ; 
forwardTo (closest , msg) ; 



void recMsg(int size, char *msg) { 

switch (msgType) { 

case SERVICE_RETRIEVAL_REQUEST : 

if (zone . isContain(key) I I lookupMyService(serviceRequest) { 
sendRetrievalResponse (serviceRequest , serviceDescription) ; 
y else if (neighbor = zone . isNeighborContain(key) >= 0) { 
forwardTo (neighbor, msg); 

} else { 

sendCloseNodeRequest (serviceRequest, key, logicalDistance) ; 

> 

break; 

case SERVICE_NODE_REQUEST : 

if (lookupMyService (serviceRequest)) { 

sendRetrievalResponse (serviceRequest, serviceDescription); 

} else if (neighbor = zone . isNeighborContain (key) >= 0) { 
sendCloseNodeReponse (serviceRequest, neighbor, 0); 

} else { 

myDistance = zone . distance (key); 

if (myDistance < msg->logicalDistance - threshold) 

sendCloseNodeResponse (serviceRequest, address, myDistance); 

> 

break; 

case CL0SE_N0DE_RESP0NSE : 
addReply (msg) ; 
break; 



} 



4 Simulation Results and Discussion 

We implemented the prototype of our scheme based on the IEEE 802.11 MAC 
and AODV [10] routing protocol in the ns-2 simulator with the CMU wire- 
less extension. Two-dimensional CAN [6] was chosen as the basic construction 
algorithm for peer-to-peer structure. In DHT-based P2P overlay, periodical ex- 
change of routing information is essential for the structure maintenance, but it is 
a large burden for limited communication capacity of MANET wireless channel. 
Whereas most of DHT structures maintain O(logn) neighbors, d-dimensional 
CAN requires only 2d neighbors. For implementation simplification and effi- 
ciency, we chose 2-dimensional CAN. 
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Number of nodes 



Fig. 3. Route length with respect to the number of nodes 



For comparison, we also implemented the basic flooding approach and the 
original CAN algorithm which doesn’t use proximity-based acceleration in the 
retrieval. The transmission range of each node is 250m, and the channel bit rate 
is 2 Mbps. The simulation area of 1500m x 1500m is used, and the number 
of nodes varies from 25 to 150 increased by 25, which are randomly placed in 
the area. In the initial stage, every node configures a MANET and an overlay, 
and 50% of nodes registers 3 services with 60 second interval. During the total 
simulation time of 20000 seconds, they periodically reregister their services and 
30% of nodes discovers randomly selected 10 services in every 50 seconds. 

Figure 3 shows the route length of service discovery. The more nodes join the 
network, the longer routes are needed in the original CAN algorithm, because 
the request message flows according to only the logical structure. In the case 
of flooding, a request message reaches all other nodes, so the shortest route 
was established between the requester and the service provider. Our algorithm 
achieves the similar route length with the flooding because the physically close 
node is chosen first and the logical structure is inspected next. 

We estimated the communication overhead by counting the number of mes- 
sages. Fig. 4(a) shows the number of received messages by all nodes, and (b) 
the number of sent messages from all nodes. As the number of nodes increases, 
the number of messages in the flooding scheme increases in geometrical order. 
Structured schemes show the superiority in reducing the number of messages. 
The interesting point is that the number of messages of our algorithm is not 
more than that of original CAN in spite of the 1-hop broadcast overhead. The 
reduced route length also reduces the total number of messages, so our algorithm 
can achieve the shorter routes without excessive communication overhead. 

Figure 5 depicts the probability of meeting the service owner in the middle of 
routing. As we predicted, our UDD scheme raises the probability as the number 
of nodes increases. 

The next result shows the effect on the route length in the mobile environ- 
ment, for 100 nodes. Random waypoint mobility model is used with the maxi- 
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Fig. 5. Probability of meeting service provider in the mid of routing 
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mum speed of 40 m/s, and the maximum pause time varies from 0 to 40 seconds. 
Regardless of the degree of mobility, all 3 schemes show the similar result with 
that of the static network. The number of messages doesn’t show any differ- 
ence either because the messages for maintenance are counted for only logical 
structure, but the underlying IP routing messages increased due to the node 
mobility. 



Flooding — ic -Can - * - Proximity-based 




Pause time (secs) 



Fig. 6. Route length in the mobile environment 



5 Conclusion 

We proposed a scheme for accelerating the service discovery based on the DHT- 
based P2P overlay network in MANETs. Using the physical neighbor information 
which can be easily obtained in MANETs, the proposed algorithm could achieve 
the significant performance gain in terms of the routing distance. The overhead 
due to the 1-hop broadcast messages could be offset by the decreased number 
of hops. Our algorithm was implemented and evaluated with the CAN overlay 
network, and also applicable for other DHT-based P2P overlay structures. 
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Abstract. As multimedia- and group-oriented computing becomes increasingly 
popular for the users of wireless mobile networks, the importance of features 
like quality of service (QoS) and multicasting support grows. Ad hoc networks 
can provide users with the mobility they demand, if efficient QoS multicasting 
strategies are developed. The ad hoc QoS multicasting (AQM) protocol 
achieves multicasting efficiency by tracking resource availability within a 
node’s neighbourhood and announces it at session initiation. When nodes join a 
session of a certain QoS class, this information is updated and used to select the 
most appropriate routes. AQM is compared to a non-QoS scheme with 
emphasis on service satisfaction of members and sessions in an environment 
with multiple service classes. By applying QoS restrictions, AQM improves the 
multicasting efficiency for members and sessions. The results show that QoS is 
essential for and applicable to multicast routing in ad hoc networks. 



1 Introduction 

The increasing popularity of video, voice and data communications over the Internet 
and the rapid penetration of mobile telephony have changed the expectations of 
wireless users. Voice communication is accompanied by multimedia and the need for 
group-oriented services and applications is increasing. Therefore, it is essential that 
wireless and multimedia be brought together [1], Ad hoc networks are communication 
groups formed by wireless mobile hosts without any infrastructure or centralised 
control, which can accompany these developments. 

Quality of service (QoS) support for multimedia applications is closely related to 
resource allocation, the objective of which is to decide how to reserve resources such 
that QoS requirements of all the applications can be satisfied [2], However, it is a 
significant technical challenge to provide reliable high-speed end-to-end 
communications in these networks, due to their dynamic topology, distributed 
management, and multihop connections [3]. In this regard, multicasting is a promising 
technique, the advantage of which is that packets are only multiplexed when it is 
necessary to reach two or more receivers on disjoint paths. 
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It is not an easy task to incorporate QoS to ad hoc multicasting. Incremental 
changes on existing schemes cannot address the critical issues mentioned above 
efficiently. In this paper, the ad hoc QoS multicasting (AQM) protocol is presented to 
improve multicasting efficiency through QoS management. AQM tracks availability 
of QoS within a node’s neighbourhood based on previous reservations in a network of 
multiple service classes, and announces it at session initiation. During the join 
process, this information is updated and used to select routes which can satisfy the 
QoS requirements of the session. Thus, AQM significantly improves the multicasting 
efficiency for members and sessions. The rest of this paper is organised as follows. 
Previous research related to ad hoc multicasting is summarised in Chapter 2. AQM is 
introduced in Chapter 3. The performance of the proposed system is evaluated in 
Chapter 4. Concluding remarks and future work are presented in Chapter 5. 



2 An Overview to Ad Hoc Multicasting Protocols 



Several protocols have been developed to perform ad hoc multicast routing. However, 
they do not address the QoS aspect of ad hoc communication, which is becoming 
increasingly important as the demand for mobile multimedia increases. 

Associativity-based ad hoc multicast (ABAM) builds a source-based multicast tree 
[4]. Association stability, which results when the number of beacons received 
consecutively from a neighbour reaches a threshold, helps the source select routes 
which will probably last longer and need fewer reconfigurations. The tree formation is 
initiated by the source, whereby it identifies its receivers. To join a multicast tree, a 
node broadcasts a request, collects replies from group members, selects the best route 
with a selection algorithm, and sends a confirmation. To leave a tree, a notification is 
propagated upstream along the tree until a branching or receiving node is reached. 

Neighbour-supporting multicast protocol (NSMP) utilises node locality to reduce 
route maintenance overhead [5], A mesh is created by a new source, which broadcasts 
a flooding request. Intermediate nodes cache the upstream node information contained 
in the request, and forward the packet after updating this field. When the request 
arrives at receivers, they send replies to their upstream nodes. On the return path, 
intermediate nodes make an entry to their routing tables and forward the reply 
upstream towards the source. In order to maintain the connectivity of the mesh, the 
source employs local route discoveries by periodically sending local requests, which 
are only relayed to mesh nodes and their immediate neighbours to limit flooding 
while keeping the most useful nodes informed. 

Differential destination multicast (DDM) lets source nodes manage group 
membership, and stores multicast forwarding state information encoded in headers of 
data packets to achieve stateless multicasting [6]. Join messages are unicast to the 
source, which tests admission requirements, adds the requester to its member list, and 
acknowledges it as a receiver. The source needs to refresh its member list in order to 
purge stale members. It sets a poll flag in data packets and forces its active receivers 
to resend join messages. Leave messages are also unicast to the source. Forwarding 
computation is based on destinations encoded in the headers, where each node checks 
the header for any DDM block or poll flag intended for it. 
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Multicast ad hoc on demand distance vector (MAODV) routing protocol is 
derived from AODV [7]. The multicast group leader maintains a group sequence 
number and broadcasts it periodically to keep fresh the routing information. A node 
wishing to join a multicast group generates a route request. Only the leader or 
members of the multicast group may respond to a join request by unicasting a route 
reply back to the requester, which selects the best from several replies in terms of 
highest sequence numbers and lowest hop count, and enables that route by unicasting 
a multicast activation message to its next hop. Intermediate nodes receiving the 
activation message unicast it upstream along the best route according to the replies 
they received previously. Nodes wishing to leave a group unicast a multicast 
activation message to their next hop with its prune flag set. 

The on-demand multicast routing protocol (ODMRP) introduces the concept of a 
forwarding group [8]. Sources periodically broadcast join query messages to invite 
new members and refresh existing membership information. When a node receives a 
join query, it stores the upstream node address in its routing table. If the maximum 
hop count is not exceeded, it updates the join request using this table and rebroadcasts 
the packet. When a node decides to join a session, it broadcasts a join reply. When a 
node receives a join reply, it checks the table of next nodes to see if it is on the path to 
the source. If this is the case, it sets its forwarding group flag and broadcasts its own 
join reply after updating the table of next nodes. Periodic join requests initiated by the 
source must be answered by session members with join replies to remain in the group. 



3 The Ad Hoc QoS Multicasting Protocol 



As mobile multimedia applications and group communication become popular for 
wireless users, ad hoc networks have to support QoS for multicasting. A QoS strategy 
should handle the reservation of resources, the optimisation of loss and delay to 
acceptable levels, and the implementation of QoS classes efficiently. In the following 
sections, the structural components of AQM are defined, which address these issues. 
Design details include the usage of QoS classes and levels, session initiation and 
destruction, membership management, and neighbourhood maintenance. 

In this work, four QoS classes are suggested to represent a sample set of 
applications to be supported by the ad hoc network. Defining QoS classes limits the 
amount of information to be transmitted. It is otherwise impossible to forward a best 
QoS combination without making some assumptions or losing some valuable data. It 
is preferable that nodes inform others on the availability of certain QoS conditions 
and send updates only when they change. 



3.1 Session Initiation and Destruction 

A session is defined by its identity number, application type, QoS class and, if 
predictable, duration and cost. A node starts a session by broadcasting a session 
initiation packet (SES INIT). Thus, it becomes a session initiator (MCN INIT). A 
table of active sessions (TBLSESSION) is maintained at each node to keep the 
information on the session definitions. Figure 1 shows the phases of session initiation. 
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► 

First arrival of SESJNIT 
New entry to TBLSESSION 
New entry to TBLMEMBER 
Node status MCNPRED 
Forwarded since QoS satisfied 

> 

Re-arrival of SESJNIT 
New entry to TBLMEMBER 
Not forwarded since session known 

• 

End of SESJNIT 

Limit on hop count reached 

Not forwarded since QoS not satisfied 



Fig. 1. The AQM session initiation process: SES_INIT is broadcast by MCN INIT no for a new 
session. It propagates through the network, informing all nodes from ni to iig, which update 
their TBL SESSION and TBL MEMBER. n 9 is not informed since it is beyond the QoS limits 
in terns of hop count, which is used as a measure of end-to-end delay. t t < t i+I , 0 < i < 3, 
represent the relative timing of the messages 



Using their session tables, nodes forward initiation packets of new sessions. A 
membership table (TBL MEMBER) is used to denote the status of the predecessors 
(MCN_PRED) which have informed the node about a particular multicast session, 
and the QoS status of the path from the session initiator up to that node via that 
predecessor. Session initiation packets are forwarded as long as QoS requirements are 
met. Before a packet is rebroadcast, each node updates its QoS information fields with 
the current QoS conditions. The packet is dropped if QoS requirements cannot be met 
any more, avoiding flooding the network unnecessarily. Hop count information in the 
packets is used to prevent loops. Successful propagation of session initiation data is 
an important factor for the efficiency of subsequent session joining processes. 

The session is closed by its initiator with a session destruction (SESDESTROY) 
message. Upon receiving it, all nodes clean their tables. Member nodes forwarding 
multicast data also free their resources allocated to that session. A node receiving a 
session destruction packet forwards it if it has also forwarded the corresponding 
initiation packet or is currently forwarding session data to at least one active session 
member. Thus, receivers of a closed session are forced to leave the session. 



3.2 Membership Management 

A node can directly join a session if it is already a forwarding node in that session. 
Otherwise, it broadcasts a join request packet (JOIN REQ) containing the session 
information. The predecessors of the requester propagate it upstream as long as QoS 
is satisfied. Ad hoc networks are highly dynamic, and available resources may change 
considerably since the arrival of the QoS conditions with the session initiation packet. 
Therefore, QoS conditions are checked at each node to make sure that current 
available resources allow the acceptance of a new session. Intermediate nodes 
maintain a temporary request table (TBL REQUEST) to keep track of the requests 
and replies they have forwarded and prevent false or duplicate packet processing. 

The forwarded request reaches nodes which are already members of that session 
and can directly send a reply (JOIN REP). Members of a session are the initiator, the 
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forwarders, and the receivers. Downstream nodes, having initiated or forwarded join 
requests, thus waiting for replies, aggregate these and forward only the reply offering 
the best QoS conditions towards the requester. The originator of the join request 
selects the one with the best QoS conditions among possibly several replies it 
receives. It changes its status from predecessor to receiver (MCNRCV) and sends a 
reserve message (JOINRES) to the selected node which has forwarded the reply. 

Intermediate nodes check the reserve packet to see whether they are forwarders on 
the path from the selected replier to the requester. If this is the case, they change their 
status from predecessor to forwarder (MCN FWD), reserve resources, and update 
their membership tables to keep a list of successors. They send the message upstream. 

Eventually, the reserve message reaches the originator of the reply, which can be 
the session initiator with some or without any members, a forwarder, or a receiver. If 
the replier is the session initiator and this is its first member, it changes its status from 
initiator to server (MCN_SRV). If it is a receiver, it becomes a forwarder. In both 
cases, the replier records its successor in its member table and reserves resources to 
start sending multicast data. If the node is an active server or forwarder, it has already 
reserved resources. It only adds the new member to its member table and continues 
sending the regular multicast data. Figure 2 shows the phases of joining a session. 

Each time a request-reply-reserve process succeeds, intermediate nodes have 
enough routing and membership data to take part in the packet forwarding task. When 
a node sends multicast packets, its neighbours already know if they are involved in 
the session by checking their tables, one with information on their own membership 
status, and another with a list of multicast sessions they are responsible of forwarding. 




► Actual JOIN REQ 

*■ Repeated JOIN REQ 

(a) 




Actual JOIN REP 



-► Repeated JOINREP 



(b) 




Fig. 2. The AQM session joining process: (a) JOIN REQ is issued by n 5 . It propagates through 
the network as long as QoS can be satisfied, until it reaches some members of the session. 
Nodes from ri] to n 4 update their TBL REQUEST as they forward the packet since they are not 
session members, (b) JOIN REP is sent back from MCN_INIT no to n$. It is forwarded by n 4 , 
ft 2 , ns, n 4 . (c) n 5 sends JOIN RES along the selected QoS path via n 4 , ri 2 , no, which reserve 
resources and update their status. Other nodes ignore the message. t t < t i+I , 0 < < 8, represent 
the relative timing of the messages 
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A node needs to inform its forwarder on the multicast graph upon leaving a 
session. After receiving a quit notification (SES LEAVE), the forwarding node 
deletes the leaving member from its member table. If this has been its only successor 
in that session, the forwarding node checks its own status. If the forwarding node 
itself is not a receiver, it frees resources and notifies its forwarder of its own leave. 



3.3 Neighbourhood Maintenance 

The nodes in an ad hoc network have to maintain their connectivity information with 
as much accuracy as possible to support QoS. This includes the ability to keep track 
of available bandwidth within their transmission range, and provide their neighbours 
with valid routes when asked to take part in a request-reply-reserve process of a node 
wishing to join a multicast session. 

Each node broadcasts periodic greeting messages (NBRHELLO), informing its 
neighbours on its bandwidth usage determined by the QoS classes of the sessions 
being served or forwarded by that node. To reduce overhead, greeting messages can 
be piggybacked to control or data messages. Each node aggregates the information in 
these messages to its neighbourhood table (TBLNEIGHBOUR). This table is used to 
calculate the total bandwidth currently allocated to multicast sessions in the 
neighbourhood, which is the sum of all used capacities of the neighbouring nodes for 
that time frame. Neighbourhood tables also help nodes with their decisions on packet 
forwarding. Session initiation packets are forwarded only if a node has neighbours 
other than its predecessors for that session. If a node does not receive any greeting 
messages from a neighbour for a while, it considers that neighbour lost and deletes it 
from neighbourhood, session and membership tables. 

Due to the broadcasting nature of the wireless medium, free bandwidth is node- 
based, i.e. a node’s available bandwidth is the residual capacity in its neighbourhood. 
A node can only use the remaining capacity not used by itself and its immediate 
neighbours. This approach provides a sufficient method to measure bandwidth 
availability within a neighbourhood. 



4 Performance Evaluation 



Simulations are repeated multiple times in a network with four service classes as 
defined in Table 1. Nodes generate their own sessions or join other nodes’ sessions 
with certain probabilities, which belong to one of these four classes. All simulation 
parameters are given in Table 2. The simulations are conducted using OPNET 
Modeler 10.0 Educational Version and Wireless Module [9]. The usage scenarios 
consist of open-air occasions such as search and rescue efforts or visits to nature in an 
area with boundaries, where a wired network infrastructure is not available. A node 
can take part at only one application at a time as a server or receiver. However, it can 
participate in any number of sessions as a forwarder as long as QoS conditions allow. 

AQM nodes are modelled in three layers with application, session, and network 
managers. The application manager is responsible for selecting the type of application 
to run, setting its QoS class, and making decisions on session initiation/destruction or 
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join/leave. The session manager is responsible for declaring new sessions initiated by 
its application manager, sending requests for sessions it wishes to join, keeping lists 
of sessions, members and requests of other nodes, processing and forwarding their 
messages, and taking part in their join processes when necessary. The network 
manager is responsible for packet arrival and delivery, and for broadcasting periodic 
greeting messages to make the derivation of free bandwidth information possible. 



Table 1 . QoS classes and requirements 



QoS 

Class 


Bandwidth 

Requirement 


Average 

Duration 


Delay 

Tolerance 


Application 

Type 


0 


128 Kbps 


1,200 s 


10 ms 


High-quality voice 


1 


256 Kbps 


2,400 s 


100 ms 


CD-quality streaming audio 


2 


3 Mbps 


1,200 s 


10 ms 


TV-quality video conference 


3 


4 Mbps 


4,800 s 


90 ms 


High-quality video 


Table 2. Simulation parameters 


Parameter Description 




Value 




Area size 






400 m x 400 m 


Greeting message interval 




10 s 




Maximum available bandwidth 




10 Mbps 




Node distribution (initial) 




Uniform 




Node idle times 




Exponential (300 s; 600 s; 900 s; 1,200 s) 


Service class distribution 




0: 40%; 1: 20%; 2: 30%; 3: 10% 


Session generation / joining ratio 




1/9 




Simulation duration 




8 h 




Wireless transmission range 




200 m 





Previous research efforts have mostly been evaluated through the use of important 
metrics which give a notion about the internal efficiency of a protocol. Two of these 
are data delivery ratio and control overhead [10]. However, the evaluation of QoS 
performance in ad hoc networks necessitates additional metrics. The main concern of 
this work is to evaluate the efficiency of AQM in providing multicast users with QoS 
and satisfying application requirements. Therefore, two new performance metrics, 
member- and session-level satisfaction grades, are introduced. 



4.1 The Grade of Member Satisfaction 



An important aspect of the QoS-related multicasting decisions made by AQM is the 
improvement in the ratio of overloaded member nodes, which has a direct impact on 
the satisfaction of session members regarding the multicasting service provided. On 
the other hand, the same decisions lead the network to reject more join requests than a 
non-QoS scheme. The member satisfaction grade S Me mber is defined as the weighted 
sum of these two components to evaluate the member-level success ratio of AQM: 



5 



Member 
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In (1), o represents the number of overloaded nodes, which have decided to serve 
and forward more sessions than is possible without exceeding the maximum available 
bandwidth, s is the total number of session servers, and/is the total number of session 
forwarders. The streaming nature of multimedia applications and the broadcasting 
nature of the wireless medium necessitate that session servers and forwarders have 
different bandwidth requirements within their neighbourhood. A server only takes its 
successors into consideration whereas a forwarder deals with its predecessor as well 
as its successors in terms of overload. Thus, the impact of overloaded neighbours on 
these nodes is not the same. To reflect this difference, /is multiplied by a coefficient 
a, which is set to 1.5 in the simulations. The division o/(s+a f) gives the ratio of 
overloaded nodes to all serving and forwarding nodes. Thus, the first term of the 
summation, multiplied by a relative weight coefficient (i. represents a member 
overload prevention rate. Continuing with the second term, r is the number of 
receivers, and q is the total number of join requests issued by all mobile nodes. Their 
ratio reflects the success of the scheme in satisfying a node’s request to join a session. 
The purpose of (i. which can be varied between 0 and 1, is to adjust the relative 
weight of one term over the other according to the preferences of the ad hoc network. 
To give equal weights to overload prevention and member acceptance, f> is set to 0.5 
in the simulations. Other values are possible to change the network preferences. 




0 10 20 30 40 50 

Number of Nodes 



' ■ AQM 

□ non-QoS 

III. I 



1 2 3 All 

Supported Service Class 



(a) (b) 

Fig. 3. Comparison of the member satisfaction grades of AQM and a non-QoS scheme: (a) 
Under support for multiple service classes, (b) Under support for single vs. multiple service 
classes with 50 nodes 



Figure 3(a) compares the member satisfaction grades of AQM to a non-QoS 
scheme. In AQM, nodes do not accept more traffic than the bandwidth available in 
their neighbourhood. However, overloaded members still occur due to the hidden 
terminal problem. When QoS support is deactivated, nodes do not check their 
bandwidth limitations before replying to join requests. As a result of this, some of the 
serving or forwarding nodes become heavily overloaded, and their successors start 
suffering from collisions and packet losses. As the number of nodes grows, more 
requests are accepted per node without considering the available bandwidth, which 
causes a drastic decrease in member satisfaction grades. It can be concluded that the 
application of QoS restrictions significantly increases member satisfaction. 
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Figure 3(b) compares AQM to the non-QoS scheme with regard to the supported 
QoS class in a 50-node network. In each of the first four simulation pairs, all 
generated sessions belong to a single QoS class. AQM outperforms the non-QoS 
scheme in all of these classes. Moreover, AQM’s overall performance increases as the 
network starts supporting multiple QoS classes. The reason for this improvement is 
that in AQM, sessions of lower classes can still be managed efficiently even if a join 
request for a higher-class has been rejected due to QoS restrictions. While the 
application of QoS causes more users to be rejected, the lack of these restrictions 
forces users to experience difficulties in getting any service as network population 
grows and bandwidth requirements increase. 



4.2 The Grade of Session Satisfaction 

Rejection of some join requests and excessive bandwidth occupation by single nodes 
in a session affects all its members. It is necessary to observe the implications of these 
events on sessions. The session satisfaction grade S Session is defined as the weighted 
sum of these two components to evaluate the session-level success ratio of AQM: 

= ,2, 

In (2), / is the number of sessions with at least one overloaded member, j is the 
number of sessions with at least one rejected join request, and m is the total number of 
sessions. The first term is the ratio of sessions without any overloaded members, 
whereas the second term reflects the success of AQM with regard to sessions without 
any rejections. The purpose of y, which can be varied between 0 and 1, is to adjust the 
relative weight of one term over the other according to the preferences of the ad hoc 
network. To explicitly stress the effect of overloaded sessions on AQM, y is set to 0.8 
in the simulations. Other values are possible to change the network preferences. 
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Fig. 4. Comparison of the session satisfaction grades of AQM and a non-QoS scheme: (a) 
Under support for multiple service classes, (b) Under support for single vs. multiple service 
classes with 50 nodes 
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Figure 4(a) compares the session satisfaction grades of AQM to the non-QoS 
scheme. Since AQM prevents single nodes from being overloaded more efficiently, it 
also achieves improvements in session satisfaction. However, unsatisfied sessions still 
occur. Some nodes become overloaded as a result of the allocations made by their 
neighbours that cannot be aware of each other’s reservations due to the hidden 
terminal problem. When QoS support is deactivated, on the other hand, the lack of 
bandwidth restrictions causes more nodes to become overloaded, and as the network 
grows, more sessions are affected and session satisfaction decreases. 

Figure 4(b) compares AQM to the non-QoS scheme with regard to the supported 
QoS class in a 50-node network. In each of the first four simulation pairs, all 
generated sessions belong to a single QoS class. AQM outperforms the non-QoS 
scheme in all of these classes. Thus, AQM achieves better performance by decreasing 
the number of overloaded members and sessions, at the cost of an acceptably 
increased number of rejected nodes. 



5 Conclusion 



AQM is designed to improve multicasting efficiency through the management of 
resources within each node’s neighbourhood. It is compared to a non-QoS scheme in 
a realistic network scenario where multiple application service classes are supported. 
The primary evaluation criteria for AQM are service satisfaction grades defined both 
for members and sessions. Simulations show that, by applying QoS restrictions to the 
ad hoc network, AQM achieves significantly better results than a non-QoS scheme. 
Without QoS support, users experience difficulties in getting the service they demand 
as the network population grows and bandwidth requirements increase. AQM proves 
that QoS is essential for and applicable to ad hoc multimedia networks. It is not a 
realistic assumption that a mobile network can afford a pure on-demand scheme if it 
has to support QoS. AQM proposes a hybrid method in terms of multicasting with 
table-driven session management and on-demand verification of QoS information. 

An important research direction is keeping the QoS data up-to-date, which is a 
major concern for a node in AQM, and involves the handling of lost neighbours, data 
exchange, and interpretation of a node’s QoS status. A second issue closely related to 
QoS data accuracy is the hidden terminal problem. An extension to the request-reply- 
reserve process is necessary, whereby each replying node consults its neighbourhood 
to see if there are any objections. Neighbour awareness and discovery are typically 
handled by a periodic mechanism of beacons at the medium access control (MAC) 
layer. However, reliable MAC broadcast is a challenging task due to the request-to- 
send/clear-to-send (RTS/CTS) signalling problem. The MAC layer is also responsible 
for resource reservation and the acquisition of available link bandwidth information. 
However, AQM is independent of the design of lower layers. 




Multicast Routing for Ad Hoc Networks with a Multiclass Scheme 



197 



References 

1. Van Nee, R, Prasad, R.: OFDM for Wireless Multimedia Communications. Artech House 
Publishers, (2000) 

2. Zhang, Q, Zhu, Wang, G.J., Zhang, Y.Q.: Resource Allocation with Adaptive QoS for 
Multimedia Transmission over W-CDMA Channels. In: Proceedings of IEEE WCNC, 
Chicago, USA. (2000) 

3. Walrand, J., Varaiya, P.: High-Performance Communication Networks. 2nd edn. Morgan 
Kaufmann Publishers, (2000) 

4. Toh, C.K.: Ad Hoc Mobile Wireless Networks. Prentice-Hall, (2002) 

5. Lee, S., Kim, C.: A New Wireless Ad hoc Multicast Routing Protocol. Elsevier Science 
Computer Networks 38 (2002) 121-135 

6. Ji, L., Corson, M.S.: Explicit Multicasting for Mobile Ad Hoc Networks. Mobile Networks 
and Applications (MONET) 8(5) (2003) 535-549 

7. Royer, E.M., Perkins, C.E.: Multicast Ad Hoc On-Demand Distance Vector (MAODV) 
Routing. IETF MANET WG Internet Draft, work in progress, (2000) 

8. Lee, S.J, Su, W., Hsu, J., Gerla, M.: On-Demand Multicast Routing Protocol (ODMRP) for 
Ad Hoc Networks. IETF MANET WG Internet Draft, work in progress, (2000) 

9. OPNET Technologies Inc, Bethesda, MD, USA. available at http://www.opnet.com . 

10. Corson, S., Macker, J.: Mobile Ad Hoc Networking (MANET): Routing Protocol 
Performance Issues and Evaluation Considerations. RFC 2501, (1999) 



HVIA-GE: A Hardware Implementation of 
Virtual Interface Architecture Based 
on Gigabit Ethernet 



Sejin Park 1 , Sang-Hwa Chung 1 , In-Su Yoon 1 , In-Hyung Jung 1 , 
So Myeong Lee 1 , and Ben Lee 2 

1 Department of Computer Engineering 
Pusan National University, Pusan, Korea 
{sejnpark, shchung, isyoon, jung, smlee3}@pusan. ac .kr 
2 School of Electrical Engineering and Computer Science 
Oregon State University, USA 
benl@eecs . orst . edu 



Abstract. This paper presents the implementation and performance of 
the HVIA-GE card, which is a hardware implementation of the Virtual 
Interface Architecture (VIA) based on Gigabit Ethernet. VIA is a user- 
level communication interface for high performance PC clustering. The 
HVIA-GE card is a 32-bit/33MHz PCI adapter containing an FPGA 
for the VIA Protocol Engine (VPE) and a Gigabit Ethernet chip set 
to construct a high performance physical network. HVIA-GE performs 
virtual-to-physical address translation, doorbell, and send/receive com- 
pletion operations in hardware without kernel intervention. In particular, 
the Address Translation Table (ATT) is stored on the local memory of 
the HVIA-GE card, and the VPE efficiently controls the address trans- 
lation process by directly accessing the ATT. As a result, the communi- 
cation overhead during send/receive transactions is greatly reduced. Our 
experimental results show a minimum latency of 12.2 /its, and a maxi- 
mum bandwidth of 96.3 MB/s. In terms of minimum latency, HVIA-GE 
performs 4.7 times and 9.7 times faster than M-VIA, a software imple- 
mentation of VIA, and TCP/IP, respectively, over Gigabit Ethernet. In 
addition, the maximum bandwidth of HVIA-GE is 52% and 60% higher 
than M-VIA and TCP/IP, respectively. 



1 Introduction 

As cluster computing becomes more popular due to the increase in network 
speed and the enhanced performance of computing nodes, a significant effort 
has been made to reduce the communication overhead between cluster nodes to 
maximize the overall performance. In particular, there have been much research 
efforts in user-level communication to minimize the kernel intervention, such 
as context switching and data copy between protocol layers. Examples include 
Active Messages, Fast Messages, U-Net, and VIA [8]. Among them, Virtual 
Interface Architecture (VIA) was proposed to standardize different features of 



C. Aykanat et al. (Eds.): ISCIS 2004, LNCS 3280, pp. 198-206, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




HVIA-GE: A Hardware Implementation of Virtual Interface Architecture 



199 



existing user-level protocols for System Area Network. VIA can be implemented 
in either software or hardware. The software implementations include M-VIA [3] 
and Berkeley VIA [4], and the hardware implementations include ServerNet II 
[6], and cLAN [5]. 

In this paper, Gigabit Ethernet is adopted as an underlying network to con- 
struct a VIA-based PC cluster. Since Gigabit Ethernet is a standard high-speed 
network for LAN and WAN, it has an advantage in terms of cost when compared 
with proprietary high performance networks, such as Myrinet and SCI. More- 
over, when VIA is adopted as a user-level interface on Gigabit Ethernet based 
clusters, most of the low-level bandwidth can be redeemed at the application level 
by removing the time consuming TCP/IP protocol. Currently, there are a num- 
ber of efforts to implement software versions of VIA based on Gigabit Ethernet 
using either M-VIA or Berkeley VIA [2] , [1] , [7] , [9] . Meanwhile, Tandem/Compaq 
developed ServerNet II, a hardware version of VIA using Gigabit Ethernet as 
a physical network. ServerNet II uses its own switch, which supports wormhole 
routing with 512-byte packets, to connect cluster of nodes. ServerNet II shows a 
minimum latency of 12 /is for 8-byte data and a bandwidth of 92MB/s for 64 KB 
data using RDMA writes on a single Virtual Interface channel. Although, the 
specific details of the implementation were not reported, the address translation 
table was not implemented in hardware because there is no memory on the card. 
cLAN is also implemented as a hardware VIA, and shows a minimum latency 
of 7 /zs and a maximum bandwidth of llOMB/s. Although cLAN shows better 
performance than ServerNet II, it is based on an expensive proprietary network, 
similar to Myrinet and SCI. 

This paper presents the design and implementation of HVIA-GE, which is 
a Hardware implementation of VIA based on Gigabit Ethernet. HVIA-GE is a 
PCI plug-in card based on 33MHz/32-bit PCI bus. An FPGA was used to imple- 
ment the VIA Protocol Engine (VPE) and a Gigabit Ethernet chip set was used 
to connect the VPE to Gigabit Ethernet. HVIA-GE performs virtual-to-plrysical 
address translations, send/receive operations including RDMA, and completion 
notifications fully in hardware without any intervention from the kernel. In par- 
ticular, the Address Translation Table (ATT) is stored in the local memory of 
the HVIA-GE card, and the VPE efficiently performs the virtual-to-plrysical 
address translation. The PCI logic was directly implemented on the FPGA in- 
stead of using a commercial chip to minimize the latency of DMA initialization. 
The HVIA-GE cards can be connected to Gigabit Ethernet switches developed 
for LANs to form a cluster; therefore, a high performance but low cost cluster 
system can be easily constructed. 

2 VIA Overview 

VIA uses the Virtual Interfaces (Vis) to reduce the communication overhead. 
A VI for each node functions as a communication endpoint, and Vis generated 
between two nodes establish a virtual communication channel. Each VI contains 
a Work Queue (WQ), which consists of a Send Queue and a Receive Queue. 
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A send/receive transaction is initiated by posting a VI descriptor on the WQ, 
and the Network Interface Card (NIC) is notified of the send/receive transaction 
using a doorbell mechanism. Each VI descriptor contains all the information the 
NIC needs to process the corresponding request, including control information 
and pointers to data buffers. Then, the NIC performs the actual data transfer 
using DMA without any interference from the kernel. The send/receive transac- 
tion is completed when the VI descriptor’s done bit is set and the Completion 
Queue (CQ) is updated by setting the corresponding VI descriptor handle. 

3 Implementation of HVIA-GE 

Our implementation of the major components of VIA is highlighted as follows. 
First, WQ is stored in the host memory but the VI descriptors that are cur- 
rently being processed are copied and stored in the HVIA-GE card until the 
corresponding send/receive transactions are completed. Second, the send/receive 
completion notification mechanism is implemented only using the ’’done bit” in 
the status held of the VI descriptor. Third, the doorbell mechanism that noti- 
fies the start of a send/receive transaction is implemented using registers in the 
HVIA-GE card. Finally, since every send/receive operation requires a virtual-to- 
physical address translation, ATT is stored on the local memory implemented 
on the HVIA-GE card for efficient address translation. The VPE controls the 
address translation process directly based on the ATT. 




Fig. 1 . HVIA-GE Card block diagram 



Figure 1 shows the block diagram of the HVIA-GE card, which is a network 
adapter based on 33MHz/32-bit PCI bus. The PCI interface logic, VPE, the 
SDRAM controller, and the Gigabit Ethernet Controller (GEC) are all imple- 
mented using FPGA running at 33 MHz. The PCI interface logic is implemented 
directly on the FPGA, rather than using a commercial chip, such as PLX9054, 
to minimize the latency of the DMA initialization. National Semiconductor’s 
MAC (DP83820) and PHY (DP83861) are used to connect the card to Gigabit 
Ethernet. On the software side, the Virtual Interface Provider Library (VIPL) 
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and the device driver were developed based on Linux kernel 2.4. The following 
subsections provide the specifics of the HVIA-GE implementation. 

3.1 VIA Protocol Engine and Gigabit Ethernet Controller 

As shown in Fig. 1, VPE and GEC are the core modules of HVIA-GE. VPE con- 
sists of Send/Receive FIFOs, ATT Manager, Protocol Manager, RDMA Engine, 
Doorbells, and local memory controller. It processes VIPL functions delivered 
to HVIA-GE through the PCI bus. In the case of VipRegisterMem, which is the 
VIPL function used to register a user buffer, the user buffer’s virtual address, 
physical address, and size are sent to HVIA-GE as function parameters. The 
ATT manager receives information regarding the user buffer (i.e., virtual and 
physical addresses) and stores them on ATT. 

When a send/receive request is posted to a send/receive queue, HVIA-GE 
is notified through the doorbell mechanism, and obtains the corresponding VI 
descriptor via DMA. Then, the VIA Protocol Manager reads the physical address 
of the user data through the ATT Manager. If the current transaction is a send, 
it initiates a DMA read operation for the user data in the host memory and 
transfers the data to the Tx buffer in the GEC via the Send FIFO. A send/receive 
transaction can also be implemented using RDMA, which enables a local CPU to 
read/write directly from/to the memory in a remote node without intervention 
of the remote CPU. For example, a RDMA can be implemented as either RDMA 
read or RDMA write. If RDMA read is used, the local CPU must first send the 
request and then wait for the requested data to arrive from the remote node. 
Therefore, the RDMA Engine in HVIA-GE is based on RDMA write, which is 
more advantageous in terms of latency. 

Since HVIA-GE directly drives the Medium Access Control (MAC), GEC 
basically functions as a device driver for the MAC. GEC processes the initial- 
ization, transmit/receive, MAC management routines, and interfaces with the 
MAC using PCI. The operations of GEC are as follows: When a send transac- 
tion is executed, Tx Descriptor Controller receives the size of the data to be 
transmitted and the address of the remote node from VPE, and produces a Tx 
descriptor. Meantime, Tx Buffer Controller adds the header information to the 
data received from the Send FIFO, stores the packet on the Tx Buffer, and in- 
forms the MAC of the start of a transmission. Then, the MAC reads the packet 
from the Tx Buffer and transfers it to the PHY. 

3.2 Address Translation 

During a VIA send operation, the user data is transmitted directly from the 
sender’s user buffer to the receiver’s user buffer without producing a copy in 
the kernel memory. To support this zero copy mechanism, the following features 
must be implemented. First, once a user buffer is allocated for a send/receive 
operation, the virtual and physical addresses of the user buffer must be obtained 
and sent to ATT on the HVIA-GE card using PIO. Second, a user buffer area 
must be pinned down when it is registered so that it is not swapped out during 
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send/receive operations. In our implementation, one of the Linux kernel’s fea- 
tures, kiobuf, is used to pin down the user buffer. The virtual address and the 
corresponding physical address of the user buffer obtained during the pin down 
process are saved on ATT. 

ATT is divided into ATT Level 1 and ATT Level 2. Each 24-byte entry of 
ATT Level 1 corresponds to one of the allocated user buffers, which includes 
the number of the physical pages of the user buffer, the virtual address and 
the size of the first page, and ATT Level 2 pointer. ATT Level 2 stores the 
physical addresses (4-byte each) of all the allocated pages for the corresponding 
user buffer. Since ATT is implemented on the HVIA-GE card, it is important 
to acquire enough space for the table and provide an efficient access mechanism. 
In our implementation, a 64 MB SDRAM is used to store the ATT. If the ATT 
supports 1024 Vis and each VI uses one user buffer, then the SDRAM can 
support up to 60 MB of user buffer for each VI. If only one page (4 KB) is 
allocated for each user buffer, the SDRAM can support more than 3 million user 
buffers. Therefore, the capacity of the ATT should be sufficient to support most 
practical applications. 

The access mechanism to ATT operates as follows. The kernel agent assigns 
a unique memory handle number to each user buffer in a linear fashion when 
it is allocated. An ATT Level 1 entry is also assigned in the same fashion by 
consulting the memory handle of the user buffer. Thus, the address of the entry 
can be calculated by multiplying the memory handle number by the entry size. 
The current ATT Level 2 pointer is calculated by adding the previous ATT Level 
2 pointer to the number of the pages of the previous entry. 

After a send/receive request is posted, HVIA-GE obtains the corresponding 
VI descriptor from WQ via DMA. The VI descriptor includes the corresponding 
memory handle and the virtual address of the user data. Then, the VPE reads 
the corresponding ATT Level 1 entry using the memory handle. This requires 
only one SDRAM access to read the entire entry, which is 24 bytes, in burst 
mode. The target address for the physical address at ATT Level 2 is determined 
by adding the ATT Level 2 pointer in the entry to the offset of the given virtual 
address. When a user buffer is allocated on the host memory, the start address 
of the user buffer can be at any place in the first page, which is indicated by the 
size of the first page in the entry. Thus, the size of the first page in the entry 
must be considered to properly determine the correct target address of ATT 
Level 2. Finally, the physical address of the user data is obtained by adding the 
physical address found at ATT Level 2 to the offset of the given virtual address. 
Therefore, the virtual-to-physical address translation can be processed using two 
SDRAM accesses. 

Unlike the approach described here, it is also possible to implement ATT 
on the host memory. In this case, VPE has to access ATT via DMA reads. 
Although this method has an advantage in terms of the hardware cost, it takes 
about 3 times longer to access the ATT on the host memory than on SDRAM 
of the HVIA-GE card. Since ATT needs to be accessed for each send/receive 
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operation, this overhead is significant, particularly when an application involves 
frequent communications between nodes in a cluster. 



4 Experimental Results 

The performance of the HVIA-GE card was evaluated using two 800 MHz Pen- 
tium III PCs with 32-bit/33MHz/PCI bus. The PCs were running RedHat 
7.2 with Linux kernel 2.4. Also, for comparison purposes, the performances of 
TCP/IP and M-VIA were measured using AceNIC’s Gigabit Ethernet card. The 
latency and bandwidth of HVIA-GE were measured by using a ping-pong pro- 
gram developed using VIPL. The performance of M-VIA was measured using 
the vnettest program included with the M-VIA distribution. The performance 
of TCP/IP was measured by modifying the vnettest program using the socket 
library. 

4.1 Performance Comparison of HVIA-GE, M-VIA, and TCP/IP 




(a) Latency (b) Bandwidth 

Fig. 2. Latency and Bandwidth Comparisons 



Fig. 2 shows the latencies and the bandwidths results of HVIA-GE, M-VIA, 
and TCP/IP with the Ethernet MTU size of 1,514 bytes. The latency reported is 
one-half the round-trip time and the bandwidth is the total message size divided 
by the latency. The latency and bandwidth of HVIA-GE are measured using the 
R.DMA write on a single VI channel. The minimum latency results of HVIA-GE, 
M-VIA, and TCP/IP for 4 bytes of user data are 12.2 //s, 57.6 /./s, and 117.9 
fj, s, respectively. Thus, the minimum latency of HVIA-GE is 4.7 and 9.7 times 
lower than M-VIA and TCP/IP, respectively. The maximum bandwidth results 
for 256 KB of user data are 96.3 MB/s, 63.5 MB/s, and 60 MB/s for HVIA- 
GE, M-VIA, and TCP/IP, respectively. Thus, HVIA-GE achieves 50% and 59% 
higher bandwidth than M-VIA and TCP/IP, respectively. 

The minimum latency of H-VIA can be analyzed as follows. In the sender, 
the processing time required by the VPE and GEC is approximately 5.6 /is, and 
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the data transmission time from Tx Buffer of the sender to Rx Buffer of the 
receiver is approximately 4.1 /is. In the receiver, the time required to transfer 
data from Rx Buffer via GEC and the VPE to the PCI bus is approximately 
1.5 fjs. Thus, in our implementation, the time spent in the host to call the send 
VIPL function, update WQ, and post the corresponding transaction is less than 
1 /is. 



4.2 Performance Comparison of Send/Receive and RDMA Write 
on HVIA-GE 

Latency and Bandwidth. Fig. 3 shows the performance difference between 
RDMA write and send/receive. As shown in Fig. 3-a, the latency difference 
is on average 2.9 /zs regardless of data size. Fig. 3-b represents the difference 
in bandwidth. These improvements are obtained because RDMA write does not 
require a VI descriptor to be created and posted to WQ at the target node. Thus, 
the completion mechanism that sets the done bit in the VI Descriptor’s status 
field is also not necessary at the target node. Although the improvement is not 
significant for large data, it is meaningful to the applications that communicate 
using data size below 10 to 20 KB. 





(a) Difference in Latency (b) Difference in Bandwidth 

Fig. 3. Differences in Latency and Bandwidth 



CPU Utilization. Unlike the RDMA write, a send/receive requires some VIPL 
functions such as VipPostRecv and VipRecvWait at the receiver. VipPostRecv 
generates a VI descriptor, posts it to WQ and waits for HVIA-GE to read it. 
VipRecvWait waits for the message to arrive and completes the transaction when 
the message is DMAed to the user buffer. Fig. 4 shows the CPU times spent for 
VipPostRecv and VipRecvWait. The CPU time spent for VipPostRecv is about 
6.1 fi s regardless of data size. However, the CPU time spent for VipRecvWait 
increases in proportion to the data size. Thus, this represents a significant portion 
of the latency for applications that communicate using large data. In the case of 
RDMA write, VipPostRecv and VipRecvWait are not necessary, thus the RDMA 
write is superior to the send/receive in terms of CPU utilization. 
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(a) CPU time for VipPostRecv (b) CPU time for VipRecvWait 

Fig. 4. CPU time in Receive Node. 



5 Conclusions and Future Work 

In this paper, we presented the design and performance of HVIA-GE that im- 
plements the VIA protocol in hardware. The HVIA-GE card contains VPE 
and GEC, and supports virtual-to-physical address translation, doorbell, RDMA 
write, and send/receive completion operations completely in hardware without 
intervention from the kernel. In particular, ATT is stored in the local mem- 
ory on the HVIA-GE card and VPE directly and efficiently control the address 
translation process. 

Our experiment with HVIA-GE shows a minimum latency of 12.2 /zs, and a 
maximum bandwidth of 96.3 MB/s. These results indicate that the performance 
of HVIA-GE is much better than M-VIA and TCP/IP, and is comparable with 
that of ServerNet II. If applications with frequent send/receive transactions are 
executed, the performance of HVIA-GE will be better than that of ServerNet 
II because of the hardware implementation of ATT. In addition, it is easier to 
construct a low-cost, high performance cluster system compared to ServerNet II 
because HVIA-GE is connected to Gigabit Ethernet using the general Gigabit 
Ethernet switches developed for LAN. In our experiment with HVIA-GE, the 
RDMA write is better than the send/receive in terms of latency and bandwidth, 
particularly with data size below 10 to 20 KB. In addition, the RDMA write is 
superior to the send/receive in terms of CPU utilization for applications that 
communicate using large data. 

As a future work, the HVIA-GE card will be further tested with real appli- 
cations such as video streaming server. We also plan to make the HVIA-GE card 
support 64-bit data width. This means that the HVIA-GE card will support the 
64-bit/66MHz PCI bus for the system and the MAC, and all the data paths of 
VPE and SDRAM will be 64 bits. The PCI interface of the HVIA-GE card can be 
further upgraded to support PCI-X and PCI Express. With these enhancements, 
the performance of the HVIA-GE card will be significantly improved. 
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Abstract. So far, several analytic models have been proposed to predict 
the performance of transmission control protocol (TCP) such as steady- 
state throughput. However, for more detailed performance analysis of 
various TCP implementations, the fast recovery latency during which 
packet losses are retransmitted should be considered based on the rele- 
vant strategy. In this paper, we derive the loss recovery latency of three 
TCP implementations including TCP Reno, NewReno, and when selec- 
tive acknowledgement (SACK) option is used. Specifically, the number of 
lost packets each TCP sender detects and retransmits during fast recov- 
ery is considered. Thereby, the proposed model can differentiate the loss 
recovery latency of TCP using SACK option from TCP NewReno. By 
numerical results verified by simulations, we evaluate that the proposed 
model can capture the precise latency of TCP loss recovery period. 



1 Introduction 

As a reliable transport layer protocol in the Internet [15], transmission control 
protocol (TCP) provides a function of congestion control to avoid the perfor- 
mance degradation, so-called ‘congestion collapses’ [6]. In TCP congestion con- 
trol, a function for detecting and recovering a packet loss is implemented and 
called in simple. The loss recovery mechanism works based on two 

basic algorithms of fast retransmit and fast recovery [1], If loss recovery is suc- 
cessful, the sender need not wait for retransmission timeout (RTO). Generally, 
the frequency of RTO has an crucial effect on overall TCP performance and 
preventing unnecessary RTOs is also a very important issue [9,10]. 

The original fast recovery algorithm implemented in TCP Reno [1] has a 
problem that RTO occurs frequently when multiple packets are lost in a window 
at the same time. To overcome this problem, the fast recovery algorithm has 
been modified in accordance with so-called , which is 

usually called as TCP NewReno [5]. Another alternative to this end is using 
selective acknowledgement (SACK) option [7,2]. 1 

1 In this paper, our scope is limited to TCP implementations and relevant mecha- 
nisms widely approved and released by Internet Engineering Task Force (IETF). For 
simplicity, we call TCP using SACK option as TCP SACK in the rest of this paper. 
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There have been a lot of works to analyze and predict TCP performance 
through modeling TCP window’s evolution based on its cyclic characteristic 
[14,3,9,10]. Two well-known papers [14,3] derive the expected duration of loss 
recovery period to approximate TCP throughput. In the derivation of the loss 
recovery duration in [3], only successive RTOs are considered. In the extended 
model [3], it is assumed that fast recovery always continues for a single round- 
trip-time (RTT) regardless of the number of packet losses recovered by retrans- 
missions. The assumption may be true because they consider only TCP Reno 
that can hardly recover more than two packet losses without RTO [9]. 

However, the fast recovery behaviors of TCP NewReno and SACK has a 
large difference from TCP Reno in that multiple packet losses in a window can 
be recovered without RTO if fast retransmit is successful [4] . The duration of fast 
recovery of TCP NewReno and SACK may last for several RTTs in accordance 
with the number of retransmissions, which means that the assumption of ‘a single 
RTT fast recovery’ does not fit these two TCPs. Especially, there is no difference 
between TCP NewReno and TCP SACK in the capability of handling multiple 
packet losses under the assumption of successful fast retransmit initiation. The 
only difference is that TCP SACK can recover a packet loss per RTT 

[2] while TCP NewReno can recover a single packet loss per RTT [5]. 

Consequently, under the assumption that fast recovery always continues for a 
single RTT, it is impossible to reflect the benefit coming from using SACK 
option. Therefore, we extend the previous works again, and propose a new model 
based on the model developed in [9,10]. Based on the proposed model, we can 
derive the expected fast recovery latency of TCP Reno, NewReno, and SACK 
with the consideration of the number of retransmissions. 

The remainder of this paper is organized as follows. Section 2 provides a 
brief presentation of TCP loss recovery behaviors and derive its loss recovery 
probability in terms of the number of retransmissions. In Section 3 we derive the 
loss recovery latency of TCP based on the loss recovery probability derived in 
Section 2. Section 4 contains the numerical results and their discussion. Finally, 
some conclusions are summarized in Section 5. 



2 TCP Loss Recovery Model 

2.1 Cyclic Behavior of TCP Congestion Window 

During a TCP connection is maintained, the evolution of congestion window 
( ) can be modeled as if it is comprised of successive cycles [14,9,10]. Each 

cycle ends with packet loss detection as shown in Fig. 1. 

After a TCP connection is established, TCP sender starts to transmit packets 
in slow start phase. If there is no packet loss until ^ reaches slow-start- 
threshold ( , ) [5], it continues to increase in congestion avoidance phase. 

As long as packet is not lost, ^ keeps on increasing, which leads to an eventual 
packet loss(es), and the current cycle comes to an end. When a packet loss occurs, 
there are two different ways to recover it; one is by retransmission and the other 
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Fig. 1. Cyclic evolution of TCP congestion window 



one is by RTO. If the sender can detect the packet loss and retransmit it, the next 
cycle starts in congestion avoidance with the halved , as the ( i + l)tlr cycle in 
Fig. 1. However, if the sender cannot recover the packet loss by retransmission, 
it should wait for RTO expiry and restart the next cycle in slow start again with 
the initial value of - 

The total latency of a connection can be grouped into two types of period; 
one is a period of good packet transmission and the other one is a period of loss 
recovery. The term ‘loss recovery period’ in this paper means the time duration 
spent on recovering a lost packet; i.e., either fast recovery period or RTO period. 
The average throughput of the connection can be approximated by the ratio of 
the number of packets (or bytes) transmitted well to the total latency [14], Since 
the fast recovery algorithm of each TCP adopts a different strategy, the frequency 
of RTO invocation is also different for the same rate of packet losses. In the rest 
of this paper, details of this feature are addressed. 



2.2 Loss Recovery Probability 



Before deriving the loss recovery latency, it is necessary to obtain each TCP’s 
loss recovery probability in terms of the number of retransmissions. For modeling 
the evolution of congestion window and obtaining its stationary distribution, we 
mainly follow the procedures presented in [13] under the same assumptions such 
as fixed packet size, random packet losses with probability p , no ACK loss, and 
infinite packet transmission. We also follow some notations in [13] such as W max 
for receiver’s advertised window and K for 

In modeling the loss recovery behaviors of each TCP, we adopt the concept of 
‘round’ defined in [14] and ‘loss window’ in [9,10,13]. If we denote a loss window 
by 17 and the ith packet loss in 17 by li, the first packet that 17 includes is always 
l\. Additionally, we define as the number of packets that are transmitted 
newly in the fcth round during loss recovery. For n packet losses in 17 that 
includes u packets, &Q is always equal to u — n. 




